Hyperspectral image analysis for CARS, SRS, and Raman data

In this work, we have significantly enhanced the capabilities of the hyperspectral image analysis (HIA) first developed by Masia et al. 1 The HIA introduced a method to factorize the hyperspectral data into the product of component concentrations and spectra for quantitative analysis of the chemical composition of the sample. The enhancements shown here comprise (1) a spatial weighting to reduce the spatial variation of the spectral error, which improves the retrieval of the chemical components with significant local but small global concentrations; (2) a new selection criterion for the spectra used when applying sparse sampling2 to speed up sequential hyperspectral imaging; and (3) a filter for outliers in the data using singular value decomposition, suited e.g. to suppress motion artifacts. We demonstrate the enhancements on coherent anti‐Stokes Raman scattering, stimulated Raman scattering, and spontaneous Raman data. We provide the HIA software as executable for public use. © 2015 The Authors. Journal of Raman Spectroscopy published by John Wiley & Sons, Ltd.


A. FSC 3 reproducibility
The FSC 3 algorithm is based on NMF factorization where random initial guesses for concentration and spectra are used. We observed that the FSC 3 results are well reproducible in case of simple datasets [1], while for data with more complexity we found a larger distribution of the results. In order to improve the reproducibility we have introduced a modification in the FSC 3 algorithm which consists in n NMFs using independent random initial spectra and concentrations, with a high tolerance target τ H for fast execution. The NMF with the smallest error is then continued with a low tolerance target τ L ≪ τ H . We have tested the improvement in the reproducibility on the datasets of Fig. 2. We have performed 10 FSC 3 calculations made either of a single NMF or n = 20 NMFs with τ H = 0.1 and a single NMF with τ L = 0.01 using the results of the high tolerance calculations as starting point. We quantify the reproducibility using the relative variation R of the reconstructed data D * where i and j are the indexes of the FSC 3 calculations. Fig. S1 shows R with the two methods for different number of components K. The high/low tolerance method (black symbols) gives an improvement in the reproducibility of 3-8 times with respect to the single NMF method (red symbols) depending on the number of components K. Alternative to the "high/low tolerance" method, we have developed a "knock-out" method showing further improved reproducibility. In the "knock-out" method, we run a set of 2 n NMF calculations with tolerance τ using random initial guesses. We then select the half of the solutions with the smaller errors and use them as initial conditions for the next set of NMF calculations with same target tolerance τ . We repeat this until we obtain a single solution. In the algorithm there is the option to compare the last two solutions for similarity, for which we calculate the relative error between the solutions where D * 1,2 are the reconstructed data obtained from the two solutions and D are the original data. If ϵ is larger than a user defined maximum relative error ϵ max , a new set of 2 n calculations is started and the final two solutions of the iteration is added to the previous final two and out of them the two solutions with the lowest error are retained for the similarity check. variation R of the reconstructed data considering a single NMF (red), the high/low tolerance (black symbols) and the "knock-out" method (green symbols).
As can be seen in Fig. S1, the "knock-out" method shows an improved reproducibility with respect to the "high/low" method by about a factor of two. For the analysis we used n = 4, τ = 0.1 and ϵ max = 0.01.    In the next step of the iteration the weighted FSC 3 method increases the number of components K to 4 (see Fig. S7), and distinguishes chromatin as an additional component (4).
The corresponding spectrum is blueshifted compared to the cytosolic protein (component 2) consistent with nucleic acids.
In the subsequent steps K is further increased to 5 (( Fig. S8) and then 6, ( the one obtained with the "high/low tolerance" method (see Fig. 1) showing that only the weighted algorithm is able to identify the pixel with modified spectrum for f down to 0.125.
Figs S11 and S12 show a comparison between un-weighted and weighted FSC 3 algorithm using the "knock-out" method and automatic determination of the number of chemical components for the data of

E. Analysis of SRS hyperspectral images using FSC 3
The FSC 3 algorithm can be used to analyze hyperspectral images obtained with techniques different from CARS. Here we report the analysis of SRS hyperspectral data of C. elegans acquired in the region (1620-1800) cm −1 . [2] The SRS signal is proportional to the imaginary part of the susceptibility so that we skip the SVD filtering and PCKK retrieval method in the analysis. The FSC 3 factorization is applied directly on the measured SRS signal. The concentrations of the single components are individually normalized to a maximum of one and cannot be determined absolutely in the same way as for CARS since SRS does not provide a contrast for substances with no resonances in the investigate wavenumber range, which is mostly water for this data. The component spectra are normalized accordingly to retain the factorized data values. Fig. S13 shows the concentration and spectra obtained using FSC 3 with K = 4 chemical components. The results are consistent with results on a similar sample obtained with MCR analysis [2]. Component 4 can be associated to unsaturated lipids such a glyceryl trioleate [3]  little spatial structure, demonstrating the ability of the FSC 3 method to factorize the data and remove the noise. This factorization took about 20 seconds on a modern desktop PC, much faster than the MCR analysis used in [2].

F. FSC 3 analysis on spontaneous Raman hyperspectral images
3t3l1-derived adipocytes have been imaged with spontaneous Raman scattering imaging in confocal geometry on the same microscope used for CARS measurements [3]. Raman signal has been excited using a continuous-wave 532 nm laser focused by the 20× 0.75 NA dry objective, which has been used also for collection. The laser line is filtered by a com- localized at the lipid droplet membrane. It contains some water and signal in the CH stretch region (2800-3100) cm −1 , and weak features in the characteristic region (750-1800) cm −1 .
Component 3 is also dominated by fluorescence and is localized in the cytosol. It contains more water than component 2, and a weaker but spectrally similar feature in the CH-stretch region.

G. SVD based masking flowchart
In Fig. S15 the flow chart of the SVD based masking algorithm is shown. The binary vector e defines if a particular point has to be excluded from the SVD factorization at the next iteration step.

H. Running the Hyperspectral Imaging Analysis (HIA) software
The software is attached as supplementary information to the paper. First download the "HIA Software JRS.zip.pdf" and rename it as a zip file. Extract all the files in a folder. The software is written in MATLAB, compiled for Windows 7 64bit, and requires the MATLAB run-time compiler version 8.3 (2015a) to be installed. The java class "DropTargetList.class" and the parallel computing profile "HIA.settings" must be in the same folder as the executable. To run the software, launch the batch file "run HIA.bat". This will open a cmd window which can show any MATLAB error occurred during the analysis. Please use this feature to identify and report bugs. The file "HIA.chm" is a help file with the instructions to use the program and the description of the functions.