Comparing in vivo and ex vivo fiberoptic diffuse reflectance spectroscopy in colorectal cancer

In vivo data acquisition using fiberoptic diffuse reflectance spectroscopy (DRS) is more complicated and less controlled compared to ex vivo data acquisition. It would be of great benefit if classifiers for in vivo tissue discrimination based on DRS could be trained on data obtained ex vivo. In this study, in vivo and ex vivo DRS measurements are obtained during colorectal cancer surgery. A mixed model statistical analysis is used to examine the differences between the two datasets. Further-more, classifiers are trained and tested using in vivo and ex vivo data. It is found that with a classifier trained on ex vivo data and tested on in vivo data, similar results are obtained compared to a classifier trained and tested on in vivo data. In conclusion, under the conditions used in this study, classifiers intended for in vivo tissue discrimination can be trained on ex vivo data.


| INTRODUCTION
Colorectal cancer is the third most common cancer worldwide for men and women combined and the second cause of cancerrelated death [1]. Standard of care for advanced stage colorectal cancer is surgery, which is sometimes combined with neoadjuvant chemo-and/or radiotherapy. In colorectal cancer surgery, there are two main challenges. First, complete removal of the tumor, as a positive resection margin is a negative independent predictor of survival and local recurrence [2,3]. Second, avoiding very extensive surgery to prevent complications from damage to vital structures. Technology for intraoperative tissue classification could be of great benefit to decrease the number of positive resection margins, while preventing complications due to very extensive surgery.
Fiberoptic diffuse reflectance spectroscopy (DRS) can be used for intraoperative tissue classification. In DRS, light over a broad wavelength range is sent through an optical fiber into the tissue. Within the tissue, the light will undergo scattering and absorption, which depends on tissue characteristics and varies with the wavelength of the light [4,5]. Part of the light will be scattered back to the surface of the tissue where it can be collected with a second fiber. Based on the collected spectrum, different tissue types can be distinguished [5].
DRS has been used before for tissue classification in cancers, like breast, head and neck, liver, lung and also colorectal cancer [6][7][8][9][10][11][12]. In colorectal cancer, most research was done during endoscopy to detect tumor tissue inside the lumen, where only mucosal tissue and tumor tissue can be encountered [13][14][15][16][17]. However, during surgery, tissue is assessed from outside the lumen, where no mucosal tissue but mainly fat and healthy colorectal wall are present. This makes classification during surgery different compared to the endoscopic setting. Some studies have been focused on the use of DRS during colorectal cancer surgery, with accuracies ranging from 91% to 95%. However, all these studies were performed ex vivo [9,10,12]. In order to use DRS during surgery, in vivo use of DRS has to be evaluated as well. Data acquisition in vivo during colorectal cancer surgery is more complicated and less controlled compared to the ex vivo setting, in terms of pressure applied on the probe, correlation with pathology and ambient light control. Therefore, it would be of great benefit if data obtained ex vivo could be used reliably to train a classifier intended for in vivo use.
So far, not many studies have focused on the question whether results obtained ex vivo can be used for in vivo application. One study on mouse ear models was done in which DRS data were obtained in living mice (in vivo), 5 to 10 minutes after excision (ex vivo) of tissue and after 24 and 72 hours of storage [18]. Furthermore, a study was done on human nerves during surgery and postmortem [19]. Both studies found differences between in vivo measurements and ex vivo measurements. However, both studies were focused on ex vivo measurements after long-term storage. Therefore, in the current study, in vivo and ex vivo measurements are performed on colorectal cancer specimen during surgery and within 1 hour after resection. Measurement locations were marked in vivo to direct the ex vivo measurements and to allow accurate pathology registration. A mixed-effect linear regression is done to compare the results obtained in vivo and ex vivo. Furthermore, a classifier is trained using the ex vivo data and tested on the in vivo data to examine if a similar accuracy is obtained compared to the classification trained and tested on in vivo data.

| Measurement setup
The DRS system used for this study consisted of two spectrometers and tungsten halogen light source with embedded shutter. One spectrometer resolves light in the visual wavelength range, 400 to 1100 nm (Andor Technology, DU420ABRDD); the other resolves light in the near-infrared wavelength range, 900 to 1700 nm (Andor Technology, DU492A-1.7). The light source emits light from 360 to 2500 nm. The system is controlled by custom-made LabView software (National Instruments, Austin, Texas). A detailed description of the calibration of the system can be found elsewhere [20,21].
Measurements were performed using clinical-grade disposable 16 G needles (INVIVO, Gainesville, Florida). Three optical fibers with a core diameter of 200 μm were embedded in the needle, one to transport the light from the source to the tissue and two to transport the light from the tissue to the two spectrometers. The center-to-center distance between the delivering fiber and the two collecting fibers was 1.29 mm ( Figure 1).

| Data acquisition
This study was performed under approval from the internal review board (Dutch Trail Register NTR5315). Patients with colorectal cancer, who had to undergo open surgery to remove the tumor, in the Netherlands Cancer Institute, were included. All patients were included based on preoperative imaging, which indicated advanced stage colorectal cancer, stage T3 or T4. All included patients signed informed consent. All ethical guidelines were followed.
The surgeon was asked to perform measurements during surgery by placing the needle on healthy fat, healthy colorectal wall and tumor. All measurement locations were marked with a suture. After resection, the measurements were repeated ex vivo on the marked locations. These F I G U R E 1 Schematic of the measurement setup. The setup includes two spectrometers and a halogen broadband light source. The measurement needle contains three fibers, one to transport light to the tissue and two to transport the light from the tissue to the two spectrometers measurements were performed using the same needle that was used in vivo. After the ex vivo measurements, the sutures were removed and ink was used to mark the measurement locations. Thereafter, the specimen was brought to the pathology department where the specimen was processed according to standard protocol. After fixation, pathology slides were obtained for all measured locations. These pathology slides were annotated by a pathologist to obtain a ground truth for all measured locations.

| Data processing
The spectra obtained from the two spectrometers were stitched together before performing a single parameter fit using an analytical model, based on optical diffusion theory, to obtain tissue constituents and optical properties of the measured tissue volume [20]. This model uses known absorption spectra and scattering characteristics of constituents present in the measured volume to fit a spectrum to the measured reflectance and generates estimations of the tissue constituents and scattering characteristics. The model has been described in detail elsewhere [4,20]. For this study, a model with 12 parameters was used, four focused on blood, two on water and fat, three parameters were focused on scattering properties and a single parameter for collagen, beta-carotene and a scale factor ( Table 1). The scale factor couples the incident collimated light to the diffuse field in the tissue and is related to the scattering phase function [4]. The model was fitted on the measured spectra using a Levenberg-Marquardt least squares minimization method. Based on the confidence intervals obtained for all fitted parameters and based on the deviation between the measured spectrum and the obtained fit, using the relative residual, bad fits were excluded from the analysis. The relative residual was computed by dividing the absolute difference between the measured spectrum and obtained fit by the mean of the measured spectrum. If the relative residual was above 0.9, the fit was excluded. The analytical model enabled the detection of possible differences between in vivo and ex vivo to biological parameters.

| Statistics
A mixed-effect linear regression was performed to compare fitted parameters between in vivo data and ex vivo measurements, and between tissue-type measurements. A crossclassified data structure ( Figure 2) was taken into account to model correlation between all measurements performed within one patient in the different tissue type and measurement type.
Separate regression models were fitted to compare in vivo and ex vivo measurements within each tissue type. Furthermore, comparisons between tumor and fat, and tumor and healthy colorectal wall, were performed, for the in vivo and ex vivo measurements, separately. All P-values of .05 or lower were considered significant.

| Classification
Classification of the measurements was performed using a linear support vector machine (SVM). Due to the multiclass database, three binary SVMs were used to obtain a complete classification of the dataset. The three binary classifiers included one for fat vs healthy colorectal wall, one for fat vs tumor and one for healthy colorectal wall vs tumor.
The SVM was trained and tested using a leave-one-patient-T A B L E 1 All parameters used in analytical model, grouped  (Table 2). In the final classifier, the ex vivo data of the patient, of which the in vivo data were used for testing, were left out of the train dataset. All classifications were performed twice, once with all fitted parameters and once with only the selected parameters. Selection of the parameters was done using forward feature selection on only ex vivo data or on only in vivo data or on a combination of in and ex vivo data. The classifications were not optimized, but were only used to compare the difference in performance due to different train and test datasets. The performance evaluation was done using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve and the Matthews correlation coefficient (MCC). The performance of the different classifiers was compared using the McNemar's test [22], where a P-value of .05 was determined as significant.

| Inclusion
For this study, 28 patients were included, 17 male and 11 female, of 26 patients tissue was measured in vivo and ex vivo. Measurements of one patient could only be performed in vivo, because the needle could not be used anymore after the in vivo measurement. In one patient, only ex vivo measurements could be performed due to failure of the system during the in vivo measurements. The median age was 61.5 years with an interquartile range of 52.25 to 68 years. Most tumors were located in the colon (n = 15), followed by the sigmoid (n = 8) and the rectum (n = 5).
In total, 1605 spectra were measured, of which 288 were excluded because a tissue type was measured which was not included in the analysis, that is, fibrosis, inflammation and necrosis (n = 101), or because no proper fit was obtained (n = 187). In Table 3, an overview is given of the 1317 included spectra. Table 4 shows P-values for the difference in parameter value between in vivo and ex vivo measurements for all parameters and each tissue type separately.

| Statistical analysis
The Fat/(Water + Fat) (%), the scattering slope, the scale factor and the diameter of the blood vessels showed no significant difference between in and ex vivo in any of the tissue types. For healthy colorectal wall, the scattering at 800 nm, the scale factor, the diameter of the blood vessels and the fraction of Mie vs Rayleigh scattering had a P-value above .05. Finally, for tumor, the Fat/(Water + Fat) (%), scattering at 800 nm, scattering slope, scale factor, the diameter of the blood vessels and the amount of beta-carotene did not show a significant difference between in vivo and ex vivo. Comparisons between tumor and fat, and between tumor and healthy colorectal wall for each parameter, for in vivo and ex vivo separately are shown in Table 5.
For both tumor vs fat and tumor vs healthy colorectal wall, a significant difference was seen in StO 2 (%) for both in vivo and ex vivo. Moreover, for tumor vs fat, a significant difference was found in the Fat/(Water + Fat) (%) and in the scattering slope, both in and ex vivo. The scattering at 800 nm was only significantly different in the ex vivo measurements for tumor vs fat. For tumor vs healthy colorectal wall, Water + Fat (%) was significantly different both in and ex vivo and the scattering slope and fraction of Mie over Rayleigh scattering were only significantly different in the in vivo measurements.

| Selected parameters
To obtain the most important parameters used in the classification, forward feature selection was performed three times, once on only ex vivo data, once on only in vivo data and finally on the combination of both. Based on the outcome of all three feature selections, four parameters were selected and they all were selected within the first five parameters in all three feature selections.
The four parameters included Blood (%), StO 2 (%), Water + Fat (%) and Fat/(Water + Fat) (%). The first three parameters all showed significant differences between in vivo and ex vivo measurements for all tissue types ( Table 4). The fourth parameter did show significant difference between in and ex vivo for fat and healthy colorectal wall but not for tumor. All selected parameters were significantly different between in and ex vivo for almost all tissue types, but the difference between in and ex vivo for all tissue types was in the same direction.
Of the four selected parameters, Blood (%) only showed a significant difference between tumor and fat ex vivo (Table 5). For StO 2 (%), there was a significant difference between tumor and fat and tumor and healthy colorectal wall both in and ex vivo. Furthermore, the difference was in the same direction for in and ex vivo. Water + Fat (%) showed a significant difference between tumor and fat, only in vivo, and between tumor and healthy colorectal wall, both in and ex vivo. Here again, changes between in vivo and ex vivo were in the same direction. For Fat/(Water + Fat) (%), there was only a significant difference between tumor and fat, both in and ex vivo.

| Classification
Eight different classifiers were created, using different train and test datasets. In four out of eight classifiers, all parameters were used; for the other four classifiers, only the selected parameters were used. In Figure 3, the ROC curves obtained from all four classifiers using all parameters are shown per tissue type. As can be seen, the classification trained on ex vivo data and tested on in vivo data performed T A B L E 5 P-values of the parameters' comparison between tumor and fat, and tumor and healthy colorectal wall, for in vivo and ex vivo similar to the other classifiers, which is supported by the AUC and MCC values (Table 6). Furthermore, McNemar's test showed that there was no significant difference between the results from the classifier trained and tested on in vivo data and the results from the classifier trained on ex vivo data and tested on in vivo data (P = .17). In Figure 4, the ROC curves of the classifiers using the ex vivo data as train dataset and in vivo data as test dataset are shown for all parameters and for the selected parameters.  The AUC of the ROC curve for fat remained the same when only the selected parameters were used (Table 6). For healthy colorectal wall and tumor, an increase in AUC was shown when only selected parameters were used, from 0.88 to 0.90 and from 0.84 to 0.89, respectively. The MCC values of fat and healthy colorectal wall increased when only selected parameters are used, whereas the MCC value of tumor decreases.

| DISCUSSION
In this study, the potential of using ex vivo data to train a tissue classification for in vivo use is examined. AUC values of 0.96, 0.88 and 0.84 and MCC values of 0.82, 0.62 and 0.48, for fat, healthy colorectal wall and tumor, respectively, for training on ex vivo data and testing on in vivo data were obtained. These values were comparable to the AUC and MCC values when training and testing was performed on in vivo data alone. As a first step in this research, a statistical analysis was done on the fitted parameters to examine if there was a significant difference between in vivo and ex vivo data. Most clear difference is seen for the Water + Fat (%) and Fat/(Water + Fat) (%) parameters. A significant decrease of Water + Fat (%) was seen for all tissue types from in vivo to ex vivo. This most likely has to do with dehydration (vaporization and leakage) of the tissue when taken out of the patient. This is supported by the increase in Fat/(Water + Fat) (%) parameter for all tissue types, showing a decrease in water content. Furthermore, for all tissues, there was a significant difference in Blood (%) and StO 2 (%) which were both increased when measured ex vivo compared to in vivo. Increase in Blood (%) can be explained by an increase in blood volume in the capillaries after excision [19]. The increase in StO 2 (%) can be explained by the exposure to air and the decrease in oxygen consumption by the cells in the specimen. These results are not in agreement with results found by Salomatina et al. [18]. In their study, comparing in and ex vivo measurements on mice ear, deoxygenation of the blood was found for the ex vivo measurements. The difference can be explained by the time between excision and measurement. In the study by Salomatina et al., the measurements were performed within 5 to 10 minutes after excision, while tissue may still be consuming oxygen, especially in the first few minutes. In our study, the time between excision and measurement was up to 1 hour. This might increase the difference in oxygenation between in vivo and ex vivo in the current study. We measured the tissues within 1 hour after resection; based on the results from previous research, results may differ when this time interval may be significantly longer.
As stated by Jacques et al., scattering parameters should be stable for a few hours after resection if overhydration and dehydration are avoided [23]. Most scattering parameters did indeed show no significant difference between in and ex vivo, except for scattering at 800 nm in fat, scattering slope in healthy colorectal wall and the fraction of Mie scattering vs Rayleigh scattering for tumor. As stated before, the decrease in Water + Fat (%) and increase in Fat/(Water + Fat) (%) most likely showed dehydration of the tissue. The changes in some scattering parameters between in vivo and ex vivo might be explained by this.
In the second step, an SVM was trained and tested on four different combinations of training and test data. When ex vivo data were used as training and in vivo data were used for testing, similar results were obtained compared to the other classifiers. This showed that using ex vivo data to create a classifier and using it afterwards to classify, in vivo data will give similar results compared to a complete in vivo study in which both training and testing of the classifier is performed on in vivo data. Using McNemar's test, no significant difference between the results of these two classifiers was found (P = .17).
The classification results were similar if only selected parameters were used for classification. The parameters that were selected for the final classification were selected based on the results of three forward feature selections. The selected parameters did show changes between the in vivo and ex vivo setting. Parameters that are significantly different between in vivo and ex vivo might not seem useful in the classification at first. However, if these parameters either increase or decrease systematically for all tissue types, differences between tissues may still be present and they might still be of added value to the classification. For the parameters that were not selected, any differences between tumor and fat and between tumor and healthy colorectal wall that were present ex vivo were absent or at least different for the in vivo setting.
For this study, an analytical model based on the diffusion theory was used to extract different absorption and scattering properties from the reflectance spectra of the tissue. The resulting parameters of the fit of the analytical model were used for classification instead of intensity values of the spectra as was done previously [12]. The fitted parameters were used because changes in these parameters between in vivo and ex vivo could be related to biological parameters and processes. Furthermore, because the model takes into account the fiber arrangement, results are not dependent on the measurement setup and the conclusions are therefore applicable in a more general sense. However, care should be taken that the assumptions made in the fit model are appropriate. One of the assumptions made in the diffusion theory is that the tissue is homogeneous. One can doubt whether this is an appropriate assumption for the colorectal wall which consists of several layers. However, this problem is present in the in vivo setting as well as in the ex vivo setting. Moreover, bad performance of the model due to, for instance, layered tissue will result in bad fits, which were excluded from the analysis. In this study, 11% (187 of the 1605 measurements) of the measurements were excluded based on bad performance of the fit model. So, by using the analytical model, and excluding measurements with a bad performing fit, the results obtained in this study are expected to be generalizable to other measurement setups.

| CONCLUSION
Based on the results obtained in this study, it can be concluded that at least for colorectal cancer, it is possible to train a classifier intended to classify in vivo spectra reliably using ex vivo measurements. Only parameters, that are constant between in vivo and ex vivo or change similarly over tissue types, should be used for that classification. Because ex vivo data acquisition is simpler compared to in vivo, larger databases can be used for training of a classifier that can be used in vivo. This might accelerate the development of optical techniques for surgical applications.
H.) has financial interests in the subject matter, materials and equipment only in the sense that he is an employee of Philips. The other authors have declared that no conflict of interest exists.