Pixel‐level angular quantification of capsular collagen in second harmonic generation microscopy images of encapsulated thyroid nodules

Polarization‐resolved second harmonic generation microscopy is used to provide pixel‐level angular distribution of collagen in thyroid nodule capsules. The pixel‐level angular distribution is combined with textural analysis to quantify the collagen distribution in follicular adenoma (benign) and papillary thyroid carcinoma (malignant). Three second order nonlinear susceptibility tensor elements ratios, the collagen angular distribution and two parameters accounting for the collagen angular dispersion in different sized areas are extracted and corresponding images are computed in a pixel‐by‐pixel fashion. Subsequently, we show that texture analysis can be performed on these images to detect significant differences between the considered benign and malignant nodule capsules.


| INTRODUCTION
Modifications of the collagen architecture in tissues are currently being considered as a hallmark for a wide range of pathologies. Until now, collagen degradation in the basement membrane area (e.g. the dermo-epidermal junction [1] ), or the impact of tumor growth on the Abbreviations: AUROC, area under the receiver operating characteristic; FA, follicular adenoma; GLCM, gray-level co-occurrence matrix; H&E, hematoxylin and eosin; MAD, median absolute deviation; PTC, papillary thyroid carcinoma; SD, standard deviation; SHG, second harmonic generation. structure and composition of the extracellular matrix (ECM) [2][3][4] were shown to provide information with diagnostic potential. In other recent works, the fibrous capsules surrounding different organs were at the focus of a series of studies which highlighted that quantitative microscopy at the capsular level can differentiate pathologies of the liver [5] or the thyroid gland. [6] These past studies relied on image data collected with second harmonic generation (SHG) microscopy, which is known to be very useful in imaging fibrillar collagen, [7,8] abundant in the investigated tissues. Moreover, SHG microscopy has been used not only for biological and medical samples but also in the materials science field. [9][10][11][12] SHG microscopy provides multiple advantages over confocal fluorescence microscopy [13] : label-free, subfemtoliter volume resolution enabling noninvasive optical 3D sectioning, reduced photobleaching and phototoxicity and deeper tissue imaging. Moreover, due to the coherent nature of the SH signals, the SHG intensity depends on the laser beam polarization and on the structure and organization of the noncentrosymmetric molecules in the investigated sample. [14] The strength of SHG microscopy can be significantly augmented by quantitative analysis methods that can be applied to either intensity-based or polarization-resolved SHG (PSHG) images. From simple first order statistics of pixel values, to statistics which consider the spatial relationship of neighboring pixels (e.g. the gray-level cooccurrence matrix [GLCM]), [6,15,16] or even to more advanced methods for image analysis (e.g. fractal analysis, [17] wavelet transform, [18] Hough transform [19] ), a plethora of whole-image metrics were previously used for analyzing the collagen distribution in tissues from intensity-based SHG images. PSHG microscopy can extend such possibilities, by assessing other quantitative features such as the anisotropy factor, [14,20] the degree of linear polarization [21] or the Stokes parameters. [22] These are pixel-wise measures indicating the alignment of molecular dipoles relative to the incident laser beam polarization. Other reported approaches dealing with PSHG datasets exploited theoretical models to predict the organization of the collagen fibers [23,24] and different fitting algorithms to determine the elements of the second order nonlinear susceptibility tensor (χ (2) ) for collagen with pixel resolution. Along with the χ (2) elements ratios, these methods compute the collagen orientation, which has been found pathologically relevant. Less complex approaches have also been introduced to compute the orientation of the collagen fibers. [25] The main drawback of these past methods is the lack of a reference for the collagen angle in the case of the ECM, with the collagen orientation depending on the tissue orientation under the microscope. Only for particular situations (e.g. collagen fibers whose position can be determined relative to the tumor margins) this angular information can be used to offer diagnostic information. [26] Otherwise, pixel-level information provided by such techniques is lost, given that the angular orientation is only quantified at the whole-image level by using the distribution of angular values of collagen, such as the standard deviation (SD) [27] or the median absolute deviation (MAD). [28] Thyroid nodules can be benign (e.g. hyperplastic nodules, colloid nodules, or follicular adenomas (FA)) or malignant (eg papillary, follicular, or medullary thyroid carcinomas). Even if the occurrence rate of the latter is low, approximately 10%, it is nonetheless very important to accurately differentiate them from the much more frequent adenomas and nodular goiters [29] since most thyroid cancers can be successfully treated when diagnosed early. After the standard treatment of thyroid tumors consisting in partial or total surgical excision of the thyroid gland, the diagnosis is confirmed by visualizing hematoxylin and eosin (H&E) stained tissue sections in optical microscopy. SHG microscopy has been previously used to characterize the ultrastructure of collagen in the ECM for two types of papillary thyroid carcinoma (PTC), [21] and recently we have proposed [6] a quantitative analysis approach to resolve the fine structure of the capsule surrounding the thyroid gland or thyroid nodules. Using SHG microscopy for imaging the collagen capsule and a set of whole-image quantification parameters we were able to detect differences in the capsular structure in encapsulated nodules of PTC and FA nodules. In another study, we used pixel-level analysis of thyroid nodules and compared the values with those obtained on the thyroid gland capsule, by using only the χ (2) elements ratios. [30] In this article, we introduce an analysis method to assess the pixel-level distribution of collagen fibers based on PSHG datasets. Starting from a collagen angle image calculated by fitting the PSHG dataset with theoretical models, we compute an extended image set, which offers a quantitative insight of the collagen distribution at pixel-level without the limitation of tissue orientation. Subsequently, we combine the extracted pixellevel information with GLCM texture analysis to investigate the structure of nodule capsules for PTC and FA thyroid nodules. We show that this approach is efficient in detecting structural differences in the collagen architecture present in the two considered nodule capsule types and exhibits thus an important potential for PTC vs. FA diagnostics.

| Tissue samples preparation
Tissue samples were obtained according to an institutionally approved protocol from patients who underwent total surgical resection of their thyroid gland. Three cases were selected with both PTC and FA nodules developed in the thyroid in order to exclude interpatient variability. Both benign and malignant thyroid nodules were identified on the same tissue section, hence undergoing the same sample preparation procedure. All samples were handled according to the standard histology protocols. Thin tissue sections (4-7 μm) were cut from the formalin-fixed, paraffin-embedded tissue blocks, mounted on glass slides and stained with H&E. The slides were imaged with a bright-field microscope scanner (Aperio Whole Slide Scanner, Leica Biosystems) for determining the regions of interest (ROIs) which were further imaged under the PSHG microscope. For each patient, both PTC and FA nodule capsules were investigated and at least seven ROIs were imaged per nodule type per patient, resulting in a total of 25 ROIs per nodule type. Each PSHG scan area was 125 × 125 μm 2 .
2.2 | The PSHG microscopy setup PSHG images were acquired using a Leica TCS-SP confocal laser scanning microscope modified for nonlinear optical imaging. The microscope ( Figure 1) was equipped with a Ti:Sapphire laser (Chameleon Ultra II, Coherent) tuned at 870 nm, with~140 fs pulses and a repetition rate of 80 MHz. A 40× magnification and 0.75 numerical aperture (NA) objective was used for focusing the laser beam on the sample and for collecting the backward-propagating SHG signals (BSHG). The SHG signals were collected in the forward direction (FSHG) using a 0.9 NA condenser lens. The input excitation laser beam was linearly polarized by a combination between an achromatic quarter-wave plate (AQWP05M-980, Thorlabs) and an achromatic half-wave plate (AHWP05M-980, Thorlabs) placed in the laser beam path before the microscope. PSHG image stacks were acquired on different ROIs by rotating the linear laser beam polarization by increments of 20 from 0 to 180 .

| PSHG image analysis
Following the acquisition of PSHG image stacks, a custom Matlab code adapted from reference [31] was used for fitting the experimental data in a pixel-by-pixel procedure with a theoretical curve. More details about the fitting procedure can be found in. [32] The model used for the collagen SHG intensity dependence with the input polarization angle can be written as: where α is the incident laser beam polarization angle, while φ is the in-plane collagen fiber orientation angle, and χ 15 , χ 31 , χ 33 are the only nonzero elements of χ (2) under the assumption of cylindrical symmetry of collagen.
To account for the collagen distribution in a pixel-bypixel approach we have used the previously obtained φ images to compute a local dispersion of the collagen angles. Each pixel in these images was replaced with the SD or the MAD of neighboring pixels in a square window with side l, where l takes odd values. We considered the following window dimensions: 3, 7, 15, 31, and 63. The resulting images were coded SD_l and MAD_l, if SD and MAD were used to account for the local dispersion, backward/forward collection directions; BPF, band-pass filter; HWP, half-wave plate; PMT, photomultiplier tube; QWP, quarterwave plate; SHG, second harmonic generation; SPF, short-pass filter respectively. While SD and MAD are both measures of statistical dispersion of a set of values, MAD is defined as the median of the absolute deviations from the median and is less sensitive to outliers than SD. If the dataset is normally distributed, the SD is usually the better choice for assessing spread. However, if the data is not normally distributed, the MAD is the more robust measure to use.
A total of 14 different images were computed for each PSHG image stack for both forward and backward collection directions, respectively: three χ (2) elements ratios images (χERIs), φ, SD_l and MAD_l images for five different l values. Another set of 14 images for each collection direction was obtained by considering only the pixels that yielded with a value of R 2 >0.8 after the fitting procedure.
The entire set of images was subjected to GLCM texture analysis. This task was carried out with ImageJ's GLCM texture plugin. The GLCM is constructed by counting the number of occurrences of a gray level adjacent to another gray level, at a specified pixel distance d, along four angle directions 0 , 45 , 90 , and 135 . Five parameters were computed to extract textural information from the GLCM: Angular Second Moment (ASM), Contrast, Correlation, Inverse Difference Moment (IDM), and Entropy. All are extensively described in previous publications. [6,28] The parameter values were averaged for the four different angle directions for which the GLCM was extracted. Three distances between adjacent pixels were considered d = 1, 10, 100.
Since the GLCM dimension depends on the bit depth of the image it is computed for (e.g. for an 8-bit image the GLCM is a 256 × 256 matrix) and the images resulted from the fitting procedure are 32-bit, working with 32-bit images would require a high computational load. Thus, we converted all the 32-bit images to 8-bit images by linearly scaling from min to max to 0 to 255, where min and max were chosen such as at least 90% of the pixel values in each image are within the min to max interval.

| Statistical analysis
Statistical analysis was performed with Prism 8.3 (Gra-phPad Software, USA). The D'Agostino and Pearson test was performed to check for normality and depending on the result either the t test or the Mann-Whitney nonparametric test was used. For all situations an unpaired test was employed, with P values less than 0.05 being considered statistically significant. We have also computed the area under the receiver operating characteristic (AUROC) for assessing the predictive ability of the parameters that we computed from the PSHG datasets. AUROC takes values from 0 to 1, higher AUROC values being considered better. In general, an AUROC of 0.5 suggests no discrimination, while 0.7 to 0.8 is considered acceptable and 0.8 to 0.9 is considered excellent. [33] 3 | RESULTS AND DISCUSSION

| Susceptibility tensor images and statistics
PSHG datasets were acquired on ROIs containing capsules of both PTC and FA nodules (image examples are shown in Figure 2).
Samples of pixels randomly selected from all the three χERIs were used for the statistical analysis of the datasets (Figure 3). While for BSHG images, χ 31 /χ 15 did not return statistically relevant differences between PTC and FA nodule capsules, the other two ratios (χ 33 /χ 15 and χ 33 /χ 31 ) for BSHG and all the three ratios in the case of FSHG images showed statistically significant higher values in the case of FA compared to PTC nodule capsules. Previous studies have shown that χ 33 /χ 31 is representative for the local anisotropy of the sample at pixel level. [34] In the same time, because the minimum value in the SHG intensitypolarization dependence is I min~( χ 31 /χ 15 ) 2 , χ 31 /χ 15 can also be connected with the local anisotropy. An increase of both these values can be explained by a decrease of local anisotropy in the case of FA compared to PTC nodule capsules. This pixel-level result is consistent with previous results [6] where texture analysis on SHG intensity images revealed randomly organized collagen fibers in the benign nodule capsules and an organized malignant nodule capsule with parallel aligned collagen fibers.
Even though the χ (2) elements ratios provided statistically relevant differences in five of the six considered situations, these parameters alone have almost no discrimination capacity between PTC vs FA nodule capsules since the AUROC is less that 0.6 (inserts in Figure 3). Hence, the χ (2) elements ratios by themselves cannot provide a reliable classification of the two nodule capsule classes.

| Angular distribution of collagen in thyroid nodule capsules
The angular distribution of collagen was first quantified at whole-image level by using SD and MAD (Figure 4). For both metrics we obtained higher values for the FA nodule capsule compared to the PTC nodule capsule, indicating that the collagen fibers are straighter in PTC capsules, in good agreement with previous published results. [6] Highly significant statistical differences were obtained for MAD, while SD failed to show statistically significant differences in the case of BSHG images between the two nodule types. A possible explanation could be that MAD is less sensitive to outliers than SD.
A potential outlier source in φ images could correspond to image areas where adjacent pixels undergo transitions from 0 to 180 . F I G U R E 3 Statistical analysis of the χERIs corresponding to A, the BSHG configuration and B, the FSHG configuration. Distributions of the values of the three considered χ (2) elements ratios are provided as Tukey box and whisker plots for both PTC and FA nodule capsules. The insert in each graph provides the AUROC values as a measure of the discrimination capacity of each ratio between PTC and FA nodule capsules. (ns P > 0.05, ** P < 0.01, *** P < 0.001, **** P < 0.0001). AUROC, area under the receiver operating characteristic; BSHG, backward-propagating SHG signals; FA, follicular adenoma; FSHG, forward-propagating SHG signals; PTC, papillary thyroid carcinoma F I G U R E 2 SHG intensity images and the corresponding computed χ 31 /χ 15 , χ 33 /χ 15  Assessing the collagen orientation distribution at the whole-image level does not fully exploit the outputs of PSHG data fitting algorithms, which provide collagen orientation at pixel-level. To address this, we computed images of the local collagen orientation distribution, corresponding to different metrics (SD and MAD) and different computation window dimensions ( Figure 5).
All the SD/MAD pixel datasets provided statistically relevant differences between PTC and FA nodule capsules ( Figure 6). Despite these, for low SD/MAD computational windows (l = 3, 7) there is almost no discrimination capacity between the PTC vs FA nodule capsules since the AUROC is less that 0.6 (inserts in Figure 6). As the window size increases the discrimination capacity increases as well, with the highest values obtained for MAD images obtained for the FSHG configuration. For almost all considered situations (except for l = 3 in BSHG images) MAD yields better discrimination capacity between PTC and FA than SD. This result shows that both at image-level (125 × 125 μm 2 ) and at smallscale (from 0.54 to 236 μm 2 corresponding to 3 × 3 to configuration. The 08 index represent images computed only keeping pixels with R 2 > 0.8. AUROC values higher than 0.7 are highlighted and considered to show significant discrimination capacity of the parameter. Under the stat column the statistical inference is shown (ns P >0 .05, * P < 0.05, ** P < 0.01, *** P <0 .001, **** P < 0.0001). Each cell under the stat columns is color coded: red represents a higher value in the case of the malignant nodule capsule, green represents a higher value in the case of the benign nodule capsule, while yellow represents equal values for the two capsules. AUROC, area under the receiver operating characteristic; BSHG, backward-propagating SHG signals 63 × 63 pixel ROIs, respectively) collagen is randomly organized in FA nodule capsules and parallel aligned with the PTC nodule. We hypothesize that this behavior might be interpreted as a defense mechanism of the organism against malignant tumors.

| Texture analysis on PSHG image sets
Texture analysis using the GLCM was conducted on the φ image, the three χERIs, the five MAD and five SD images and was performed for both the BSHG (Figure 7) and FSHG (Figure 8) configurations. For the same image type, we computed thresholded images keeping only pixels, which correspond to R 2 > 0.8, as obtained from the fitting algorithm.
For the FSHG datasets, GLCM Contrast can be used to discriminate between PTC and FA nodule capsules (AUROC >0.7) on the χ 33 /χ 31 image, on MAD_l images and on SD_l images. This latter case provided higher discrimination efficiency especially when using thresholded images with pixels corresponding to R 2 > 0.8. In the case of higher distances between adjacent pixels (d = 100) Contrast was able to differentiate between PTC and FA nodule capsules also in the case of SD_l images that were not subjected to R 2 > 0.8 thresholding. For both SD and MAD images the smallest computational window (l = 3) failed to return satisfactory results. Noteworthy, while both MAD_l and SD_l images (the latter only in particular cases) can discriminate the two nodule capsule types based on the Contrast metric, the angle images (φ) fail at this task. AUROC values were similar between the three considered pixels distances (d = 1, 10, 100) for which the GLCM was computed.
Higher GLCM Contrast values obtained for the benign nodule capsules than for malignant ones are associated with larger variability within the studied GLCMs. This translated into higher variability of SD/MAD values as well as for the χ 33 /χ 31 for benign nodule capsules which is consistent with the results obtained with the texture analysis that we previously performed on SHG intensity images. [6] In the case of the BSHG datasets, GLCM Contrast was able to differentiate between the two nodule classes on all the χERIs, and also in the images computed for F I G U R E 8 Statistical results obtained from comparing the malignant and benign nodule capsules on different images in the FSHG configuration. The 08 index represent images computed only keeping pixels with R 2 > 0.8. AUROC values higher than 0.7 are highlighted and considered to show significant discrimination capacity of the parameter. Under the stat column the statistical inference is shown (ns P >0 .05, * P <0 .05, ** P < 0.01, *** P <0 .001, **** P <0 .0001). Each cell under the stat columns is color coded: red represents a higher value in the case of the malignant nodule capsule, while green represents a higher value in the case of the benign nodule capsule. AUROC, area under the receiver operating characteristic; FSHG, forward-propagating SHG signals MAD_3 (both with and without R 2 > 0.8) and SD_3 and SD_7 without filtering out pixels with R 2 > 0.8. Similar AUROC values were obtained between the three considered pixels distances for which the GLCM was computed. Opposed to the FSHG case, here higher GLCM Contrast values were obtained for malignant nodules. This result can be explained considering that BSHG images have a granular aspect (see for example Figures 2 and 5), which favors higher GLCM Contrast between adjacent pixels. It is also confirmed for higher values of the computational window in the case of both MAD and SD (Figure 7), where, even though the AUROC value is close to 0.5 the Contrast becomes higher for benign nodule capsules.
Results matching those obtained for Contrast were also achieved in the case of the GLCM Correlation on FSHG images when considering small pixel distances (d = 1). Increasing the pixel distance (d = 10) leads to a decrease in AUROC values for MAD_l images, with better results on χERIs and φ images, especially when not applying R 2 > 0.8 thresholding. A loss in discrimination efficiency for d = 100 on all studied image types can be observed (AUROC <0.6).
For the BSHG datasets, Correlation on χERIs and SD_l images is found suitable for both R 2 > 0.8 thresholded and non-thresholded images. It is interesting to notice that when increasing the pixel distance (d = 10) the AUROC values for Correlation increase. For even higher distances between adjacent pixels (d = 100), the Correlation fails to discriminate between the two classes of interest.
For all the studied situations higher Correlation values were obtained for the PTC nodule capsule which might indicate a more regular structural pattern. Lower Correlation generally means that the pixel values are independent from one another. The lower Correlation for the FA nodule capsule can be explained by the waviness of the collagen fibers in this type of capsule.
When using FSHG datasets, ASM, IDM, and Entropy fail to discriminate between PTC and FA nodule capsules (AUROC values less than 0.7), independent on the pixel distance for which the GLCM was computed. The same GLCM textural features provided a high discrimination capacity with AUROC values close to 0.8 for BSHG datasets. Higher AUROC values (~0.85) were obtained without filtering out pixels with R 2 > 0.8. Similar AUROC values were obtained between the three considered pixels distances for which the GLCM was computed.
The discrimination capacity between PTC and FA nodule capsules provided by MAD/SD images could potentially be used for classification purposes using deep learning methods, where large image datasets are required. The approach used in this article of computing MAD/SD images from different sized ROIs around pixels can be used for data augmentation, helping to extend the training set.

| CONCLUSION
The combination of PSHG imaging with fitting procedures and texture analysis reveals significant differences in the collagen structure between benign (follicular adenoma) and malignant (papillary thyroid carcinoma) thyroid nodule capsules. In this analysis we have considered parameters extracted from fitting the experimental PSHG datasets with theoretical collagen models in a pixel-by-pixel fashion, addressing the χ (2) elements ratios and the collagen orientation angle. Starting from the collagen orientation angle computed from the fitting procedure we proposed a method to take advantage of the pixel-level information provided by this parameter by estimating the angular spread in small ROIs around each pixel using two different measures: SD and MAD. This approach showed consistent results in differentiating between the two thyroid nodule capsule types. Five textural features computed from the GLCM were also used. The results showed that the collagen structure is more ordered with malignancy, with straighter and more aligned collagen fibers in the malignant nodule capsule. Hence, PSHG microscopy and texture analysis can provide a powerful tool to investigate the angular distribution of collagen in fibrous capsules surrounding thyroid nodules.

CONFLICTS OF INTEREST
The authors declare no potential conflict of interest.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.