Department of Biomedical Engineering, Rutgers University, Piscataway, NJ
Center for Engineering in Medicine/Surgical Services, Massachusetts General Hospital, Boston, MA
Address reprint requests to Martin L. Yarmush, M.D., Ph.D., Department of Biomedical Engineering, Rutgers University, 599 Taylor Road, Piscataway, NJ 08854. Telephone: 848-445-6528; FAX: 732-445-3155; E-mail: email@example.com
This work was partially supported by the National Institutes of Health (grant R01DK059766). Nir I. Nativ, Alvin I. Chen, and Gabriel Yarmush were supported by a National Institutes of Health–funded biotechnology training fellowship.
Orthotopic liver transplantation is a highly successful therapy for the treatment of both acute and chronic liver failure, but it is limited by donor scarcity.[1-4] Excessive intrahepatic triglyceride accumulation in the form of large lipid droplets (LDs) that displace the nucleus to the cell periphery [large-droplet macrovesicular steatosis (ld-MaS)] in more than 30% of hepatocytes is one of the most common causes of liver graft removal from the donor pool.[1, 4] Such livers are more sensitive to the ischemia/reperfusion injury inherent to transplantation, and this leads to increased rates of primary nonfunction, morbidity, and mortality.[1, 4] ld-MaS assessment is currently performed by pathologists using hematoxylin and eosin (H&E)–stained, frozen or paraffin-embedded liver histology slides or images.[1, 4] Although a pathologist's quantification of ld-MaS in both frozen and paraffin-embedded tissues (in terms of the percentage of hepatocytes containing ld-Mas) has been shown to correlate with transplantation outcomes,[1, 4] it is also a rather subjective analysis, and reports indicate significant discrepancies in scoring results among different pathologists.
This has motivated the development of digital image analysis methods for unbiased and reproducible steatosis assessments of H&E-stained liver slides.[5-9] El-Badry et al. and Li et al. have reported that algorithms that quantify macrovesicular steatosis on the basis of the relative cross-sectional surface area (CSSA) occupied by all LDs [ld-MaS and small-droplet macrovesicular steatosis (sd-MaS)] underestimate a pathologist's steatosis quantification.[5, 8] These methods do not use the size distribution of the LDs or the distribution of LDs among the individual cells, which are features employed by pathologists to determine ld-MaS percentages.[1, 4, 8] Other studies have used analyses that differentiate ld-MaS and sd-MaS with a predetermined LD CSSA cutoff size.[7, 9, 10] Boyles et al. found agreement between one such image analysis algorithm and a pathologist's semiquantitative metric of 4 steatosis grades for biopsy specimens obtained from patients with chronic hepatitis C. However, the semiquantitative grade used in that study is not an established metric for evaluating liver graft suitability for transplantation which instead is based on a 30% cutoff for the percentage of hepatocytes with ld-MaS.[1, 4] Marsman et al. found that automated image-based quantification of ld-MaS poorly correlated with pathologists' quantifications of ld-MaS percentages in cadaveric liver grafts. In both studies, the distinction between ld-MaS and sd-MaS was based on an arbitrarily predetermined cutoff for the LD CSSA. Currently, no rigorous optimization of the LD size cutoff has been performed, and none of the existing algorithms take into account another feature used by pathologists to identify ld-MaS: the ability to push the nucleus toward the edge of the cytoplasm.[1, 4]
Here we show that using the LD size as a single criterion for differentiating between ld-MaS and sd-MaS tends to either underestimate or overestimate pathologists' assessments of ld-MaS percentages, with this depending on the size cutoff. We also demonstrate that incorporating the criterion of LD-induced nuclear dislocation to the cell periphery dramatically improves the ability to separate ld-MaS from sd-MaS and the correlation between pathologists' assessments and automated image analysis–based evaluations of ld-MaS percentages.
PATIENTS AND METHODS
Human Liver Histology
Human liver tissue samples were obtained from 9 donors who had diverse backgrounds in terms of age, cold ischemia time, machine perfusion time, body mass index, and so forth (as indicated in Supporting Table 1). The tissues were collected by core needle biopsy, paraffin-embedded, serially sectioned to 5 μm, mounted onto adhesive slides, and eventually stained with H&E as described elsewhere. All biopsy samples were collected under Columbia University Medical Center Review Board (lRB-AAAAA8408) guidelines. Digital images were obtained with a Micromaster II bench-top phase-contrast digital video microscope (Fisher, Pittsburgh, PA) and Micron 2.0 software (Westover Scientific, Mill Creek, WA).
Segmentation of Liver Cellular and Tissue Structures
H&E-stained liver histology images are characterized by multiple types of tissue structures, including ld-MaS, sd-MaS, hepatocyte nuclei and cytoplasm, nonparenchymal cells, sinusoidal spaces, and, in cases of frozen-section images, other empty spacelike artifacts that are often generated during frozen-section preparation. In order to accurately quantify ld-MaS, cell nuclei and hepatic LDs must be accurately distinguished from the remaining features. Model-based segmentation methods such as active contour modeling (ACM) have been extensively used for such applications11-13; however, algorithms that employ ACM must be introduced to accurately initialize seeds in the proximity of each image object. A method that automatically initializes seeds on the basis of the image intensity was introduced and combined with ACM for cell nuclei and hepatic LD segmentation, as shown in Fig. 1. Edges are first discerned from the color gradient of the H&E-stained liver histology digital image,[15, 16] which notably produces significantly stronger edges than the traditional grayscale gradient (Fig. 1B). Next, seed points for potential steatotic droplets and nuclei are automatically initialized on the basis of light and dark intensity contrast from the color image (Fig. 1C).[11-13] ACM is then used to refine the optimal surface area boundary through the fitting of each object to an LD or nucleus model based on the following characteristics: strong edges at the boundaries, circular morphologies, smooth contours, and homogeneous internal features (Fig. 1D). The candidate object's score is compared to its score for the previous iteration, and the object's segmentation with the highest score is stored (Fig. 1E). On the basis of a trained pathologist's assessment, an object's final score cutoff is established to distinguish segmented LDs from segmented sinusoidal spaces and to distinguish segmented hepatocyte nuclei from nonparenchymal cells.
Quantifying Nucleus Displacement on the Basis of LD-Nucleus Adjacency
ld-MaS is defined as intrahepatic LDs that are large enough to displace the nucleus from the center of the cytoplasm and are, therefore, adjacent to the nucleus (Supporting Fig. 1A). To capture this phenomenon, an LD-nucleus adjacency parameter was created as follows. The LD radii of 52,000 LDs from 54 liver histology digital images of 9 patients were quantified. In addition, the shortest distance between each LD's perimeter and the perimeter of its nearest nucleus was quantified, and LD-nucleus adjacency was calculated as follows (Supporting Fig. 2):
Adjacency values approaching 1 indicated that the LD was adjacent to the nearest nucleus. The relationship between LD-nucleus adjacency and the LD CSSA was explored. The LDs were grouped into 7 clusters on the basis of the CSSA, and the percentage of LD-nucleus adjacency > 0.9 in each group was quantified and plotted versus the average CSSA in each group (Supporting Fig. 1B).
Discriminating ld-MaS and sd-MaS With an Unsupervised Cluster Analysis
k-means unsupervised clustering of 52,000 LDs was employed to investigate whether distinct, homogeneous groupings could be found that suggest separation between ld-MaS and sd-MaS. Six image parameters—LD radius, mean color intensity, circularity, convexity, aspect ratio, and LD-nucleus adjacency—were included in the analysis as well as their combinations. Two population distributions (corresponding to ld-MaS and sd-MaS) were assumed for the cluster analysis. The cluster strength was determined with the mean silhouette score (s): s values approaching 1 reflected high intercluster distances and low intracluster distances and thus indicated improved separation between the 2 groups (Table 1).
Table 1. k-Means Clustering of ld-MaS–Relevant Image Features
NOTE: The following morphological features were evaluated with mean s values for their ability to classify ld-MaS and sd-MaS into 2 distinct groups: (1) radius, (2) intensity, (3) sphericity, (4) convexity, (5) aspect ratio, (6) LD-nucleus adjacency, and combinations thereof. s is a measure of the cluster strength: s values approaching 1 reflect high intercluster distances and low intracluster distances and thus indicate improved separation between the 2 groups.
Intensity, sphericity, convexity, and aspect ratio
Radius, intensity, sphericity, convexity, and aspect ratio
Radius and LD-nucleus adjacency
Radius, intensity, sphericity, convexity, aspect ratio, and LD-nucleus adjacency
Decision Tree Learning From an Unsupervised Cluster Analysis
Based on the k-means unsupervised clustering of 52,000 LDs described previously, a decision tree classifier was established to provide a set of interpretable rules for distinguishing ld-MaS and sd-MaS. The size and accuracy of the decision tree were optimized with n-fold cross-validation with n set to 10, as is common with such methods; the final number of nodes selected for the tree was selected to minimize cross-validation error.
ld-MaS Sensitivity and Specificity Analysis
LDs in 8 H&E-stained liver histology images from 5 different patients were assessed for ld-MaS or sd-MaS by a trained pathologist (K.M.K.). LDs in those images were segmented for ld-MaS or sd-MaS with 3 different methods: (1) a 176-μm2 CSSA cutoff, (2) a 350-μm2 CSSA cutoff, and (3) decision tree–based criteria. The sensitivity was determined for each image as the ratio of true positives (number of LDs marked as ld-MaS by both the algorithm and the pathologist) to true positives and false negatives (number of LDs marked as ld-MaS by the pathologist but missed by the algorithm). Specificity was determined for each image as the ratio of true negatives (LDs that were not marked as ld-MaS by either the software or the pathologist) to true negatives and false positives (LDs that were not marked as ld-MaS by the pathologist but were determined to be ld-MaS by the algorithm). The percentages obtained for each image were averaged to yield the mean algorithm sensitivity and specificity values for each method.
Comparing ld-MaS Percentages Determined by Image Analysis Methods to Pathologists' Assessments
The ld-MaS percentage for each H&E-stained liver histology image was estimated with 5 classification methods: (1) the relative image CSSA occupied by total LDs (both ld-MaS and sd-MaS) normalized to the image CSSA, (2) the relative number of total LDs (both ld-MaS and sd-MaS) normalized to the cell number as assessed by the nucleus count, (3) the relative number of ld-MaS LDs determined with a CSSA cutoff of 176 μm2 and normalized to the cell number, (4) the relative number of ld-MaS LDs determined with a CSSA cutoff of 350 μm2 and normalized to the cell number, and (5) the relative number of ld-MaS LDs determined with decision tree classification and normalized to the cell number. For each method, the ld-MaS percentages attained from 3 to 6 images were averaged to represent the image analysis–based ld-MaS percentage for each donor.
The obtained values were compared to the trained pathologists' estimations (K.M.K and J.H.L.) on the basis of the values of the log2-fold difference, fractional deviation, and linear regression coefficient of determination (R2) to assess the accuracy of each method. The log2-fold difference was calculated as follows:
The log2-fold difference measured the multiplicative deviation of the algorithm's estimate from the pathologists' estimation. A small log2-fold difference indicated higher accuracy. The fractional deviation was calculated as follows:
The fractional deviation measured the additive deviation of the algorithm's estimate from the pathologists' estimation. A small fractional deviation indicated higher accuracy. R2 was the squared correlation between the algorithm-estimated ld-MaS percentage and the pathologists' estimation. An R2 value approaching 1 indicated a better fit to a linear model.
The results shown in the text and figures are means and standard errors. A 1-way analysis of variance followed by Fisher's least significant difference post hoc test was performed with KaleidaGraph (Synergy Software, Reading, PA). P values < 0.05 indicate statistical significance.
Through the application of a novel image analysis algorithm to H&E-stained liver histology images, steatosis-relevant morphological features such as the LD size, number, and shape were obtained in an automated and, therefore, completely objective and reproducible manner (Fig. 1). Figure 2 demonstrates that an image analysis method aimed at quantifying the ld-MaS percentage solely on the basis of the relative CSSA occupied by all LDs (ld-MaS and sd-MaS) in the tissue underestimated the pathologists' assessment of the ld-MaS percentage, and this is consistent with previous reports.[5, 8] Conversely, determining the ld-MaS percentage with an image analysis method quantifying the total number of LDs (ld-MaS and sd-MaS) present in the tissue overestimated the pathologists' assessment (Fig. 2), and this was expected because the pathologists accounted only for ld-MaS.[1, 4, 20]
The 2 criteria used by pathologists to define LDs as ld-MaS are size (as determined by the CSSA) and the ability to displace the hepatocyte's nucleus to the cell periphery. Previous reports have sought to differentiate between ld-MaS and sd-MaS solely on the basis of a predetermined LD CSSA cutoff, but success has been limited.[9, 10] Here we explored the impact of 5 shape-dependent features of LDs—mean color intensity, circularity, convexity, aspect ratio, and CSSA—and then added the sixth criterion of LD-nucleus adjacency as an indicator of nucleus displacement (Fig. 1 and Supporting Fig. 2). A cluster analysis of these 6 features revealed their relative importance in differentiating ld-MaS from sd-MaS within an H&E-stained liver histology digital image (Table 1). Clusters resulting from the incorporation of the LD mean color intensity, circularity, convexity, and aspect ratio yielded an s value lower than the value for CSSA alone (s = 0.33 versus s = 0.56; Table 1) or the value when they were combined (s = 0.63; Table 1). Next, we examined the additional contribution of LD-nucleus adjacency to the previously explored features. LD-nucleus adjacency alone was observed to yield the highest separation for any single feature (s = 0.59; Table 1); this was slightly higher than that of CSSA (s = 0.56; Table 1). The combination of the 2 latter features, LD-nucleus adjacency and LD CSSA (which, incidentally, are the main ones used by pathologists to differentiate between ld-MaS and sd-MaS), improved separation (s = 0.74; Table 1) to a level similar to that obtained with a combination of all 6 features (s = 0.78; Table 1). This suggests that LD-nucleus adjacency and LD CSSA are clearly the main determining features, whereas the others have little impact on the determination.
These results motivated the inclusion of LD-nucleus adjacency as part of the algorithm determining whether an LD should be labeled as ld-MaS. An analysis of 52,000 LDs from 54 liver histology images of 9 patients indicated that as the LD CSSA increased, the percentage of LDs with an adjacent nucleus increased (Supporting Fig. 1A,B). Pathologists' examinations of these LDs confirmed that the large LDs were adjacent to the nucleus because of nuclear displacement. Furthermore, this analysis revealed that more than 95% of LDs with a CSSA equal to or greater than 350 μm2 were characterized by LD-nucleus adjacency (Supporting Fig. 1B). This CSSA cutoff is approximately 2-fold higher than a 176-μm2 cutoff previously described[9, 10] (Supporting Fig. 1B). At the 176-μm2 cutoff, only approximately 70% of the LDs exhibited LD-nucleus adjacency (Supporting Fig. 1B).
These 2 LD CSSA cutoffs (the 176-μm2 CSSA cutoff and the LD-nucleus adjacency–dependent 350-μm2 CSSA cutoff) were compared in terms of their ability to generate ld-MaS percentage assessments consistent with those of trained pathologists. The use of either of these 2 methods improved the ability to match the pathologists' assessments of ld-MaS in comparison with methods that do not separate between ld-MaS and sd-MaS (Figs. 2 and 3). In addition, the LD-nucleus adjacency–based cutoff of 350 μm2 yielded improved predictions in comparison with the previously used 176-μm2 cutoff (Fig. 3). Although the ld-MaS percentage assessment derived from the previously cited cutoff of 176 μm2 overestimated the pathologists' assessment, the 350-μm2 cutoff underestimated it (Fig. 3). An ld-MaS sensitivity and specificity analysis revealed that using the 176-μm2 CSSA cutoff led to approximately 97% sensitivity and approximately 60% specificity (Fig. 4A), and this may explain the ld-MaS percentage overestimation. This relatively low specificity was improved to approximately 99% with the 350-μm2 cutoff (P < 0.001; Fig. 4A); however, this also led to an approximately 30% reduction in sensitivity (P < 0.02; Fig. 4A). This reduction was due to the fact that LDs displacing their adjacent nuclei and defined as ld-MaS by pathologists sometimes ended up below the 350-μm2 CSSA cutoff and, as a result, were not counted as ld-MaS; this caused the ld-MaS percentage underestimation (Fig. 4B).
In contrast to the classifiers based on droplet size alone, a sensitivity and specificity analysis of a decision tree revealed classifications that were much more comparable to the pathologists' assessments. A 4-node, 2-level decision tree, pruned to minimize cross-validation error, yielded a sensitivity of approximately 99% and a specificity of approximately 94% (Fig. 4A). The optimized 4-node tree (Supporting Fig. 3) set a low-end CSSA cutoff of 174.9 μm2, under which all droplets were classified as sd-MaS. A high-end CSSA cutoff of 347.7 μm2 was set, above which all droplets were classified as ld-MaS. LDs with a CSSA between 174.9 and 347.7 μm2 were classified as ld-MaS by the tree if they were above an LD-nucleus adjacency cutoff of 0.9, which indicated that a nucleus was displaced as confirmed by a trained pathologist.
The decision tree–based assessment took into account both the size and nucleus adjacency of LDs. It afforded the high sensitivity of the low 176-μm2 cutoff criterion (P < 0.02) and the high specificity of the high 350-μm2 cutoff criteria (Fig. 4); overall, it resulted in the optimal correlation with the pathologists' ld-MaS percentage assessments (log2-fold difference = 0.19, fractional deviation = 0.12, R2 = 0.97; Fig. 3).
For the real-time assessment of ld-MaS in liver transplantation, frozen sections are more commonly used. Therefore, to begin to evaluate the potential of our digital image analysis method for segmenting ld-MaS and sd-MaS in frozen sections, we obtained a sample from a donor during an actual liver graft evaluation process. As indicated in Supporting Fig. 4, the method properly labeled ld-MaS, while excluding empty space artifacts that could have been confused with LDs. Furthermore, the segmentation was in agreement with the pathologist's evaluation and comparable to the results obtained from paraffin-embedded tissue from the same liver.
Here we describe a novel digital image analysis method for assessing ld-MaS in H&E-stained liver slides with high specificity and sensitivity. This method combines the features of LD size together with a metric that quantifies LD-nucleus dislocation to the cell periphery. This algorithm leads to a much improved correlation with ld-MaS assessments by trained pathologists (arguably the gold standard in current practice) and thus enables fully automated image analysis detection and quantification of ld-MaS in H&E-stained histology images of human livers. This automated image analysis approach is intended to assist but not replace the pathologist because additional pathological features such as necrosis, fibrosis, malignancies, and bile duct damage also need to be taken into account before any clinical decision.[4, 10, 21]
We opted to use paraffin-embedded samples to develop and test the methodology (particularly the use of the nuclear displacement criterion) because samples with diverse backgrounds in terms of age, cold ischemia time, machine perfusion time, body mass index, and so forth were more readily available. Furthermore, the vast majority of studies on hepatic steatosis use paraffin-embedded tissues (eg, Yersiz et al.), and it was necessary to use the same to compare the performance of our digital image analysis method with the performance of others. A caveat is that for the real-time assessment of ld-MaS in liver transplantation, frozen sections are more commonly used. However, in a preliminary evaluation of a frozen section, the method properly segmented ld-MaS and sd-MaS, while excluding empty space artifacts. The method's robustness likely resides in the use of many criteria to define an LD, including its roundness, sharp edges, size range, texture and color uniformity, and location within the tissue, as further detailed in Fig. 1. However, the clinical translation of this method will require a thorough examination of many more frozen sections encompassing various types and sizes of freezing artifacts and a detailed comparison of the obtained results with those obtained from paraffin-embedded samples from the same livers.
Extending this method to frozen sections could make it a useful tool for determining the ld-MaS state of liver grafts during the process of assessing liver graft transplantability.[5, 23, 24] This is especially true when the procurement surgeons must make the assessment themselves because of the unavailability of a trained pathologist at the critical time. Currently, many transplantation centers use a cutoff of approximately 30% ld-MaS as the steatotic liver graft transplantability criterion. This cutoff may be overly conservative for accounting for potential variations among different pathologists at different centers, and it could be revisited if a more objective computerized assessment were available. Furthermore, the ld-MaS score could be combined with other quantifiable criteria related to the health and ages of the donor and the recipient, blood biomarkers (eg, aspartate aminotransferase and alanine aminotransferase), and warm and cold ischemia times[1, 4] to develop an overall transplantability score.
Currently, the evaluation of ld-MaS is based entirely on the scoring of H&E-stained liver tissue sections. Recent studies using animal and in vitro models of steatosis indicate that an array of biochemical biomarkers could also be used to refine the assessment of ld-MaS. Some of these methods require biochemical processing, such as intracellular triglyceride[20, 25] and adenosine triphosphate levels,[20, 26] but they could be performed well within the timeframe needed for liver histopathology. In recent years, perfusion methods have been developed to replace the static cold storage approach to graft preservation; in such cases, data collected during perfusion could also be included in the evaluation.[2, 20, 27]
In conclusion, we have developed a novel algorithm that enables a fully automated ld-MaS percentage assessment that could be used before and after the diagnosis of human liver grafts with paraffin-embedded liver slides, and it may help to standardize ld-MaS scores among pathologists and centers. This method also has the potential to be incorporated into real-time assessments of liver graft transplantability with frozen liver slides to ensure optimal utilization of the liver graft pool.