Deep learning‐based segmentation of knee MRI for fully automatic subregional morphological assessment of cartilage tissues: Data from the Osteoarthritis Initiative

Morphological changes in knee cartilage subregions are valuable imaging‐based biomarkers for understanding progression of osteoarthritis, and they are typically detected from magnetic resonance imaging (MRI). So far, accurate segmentation of cartilage has been done manually. Deep learning approaches show high promise in automating the task; however, they lack clinically relevant evaluation. We introduce a fully automatic method for segmentation and subregional assessment of articular cartilage, and evaluate its predictive power in context of radiographic osteoarthritis progression. Two data sets of 3D double‐echo steady‐state (DESS) MRI derived from the Osteoarthritis Initiative were used: first, n = 88; second, n = 600, 0‐/12‐/24‐month visits. Our method performed deep learning‐based segmentation of knee cartilage tissues, their subregional division via multi‐atlas registration, and extraction of subregional volume and thickness. The segmentation model was developed and assessed on the first data set. Subsequently, on the second data set, the morphological measurements from our and the prior methods were analyzed in correlation and agreement, and, eventually, by their discriminative power of radiographic osteoarthritis progression over 12 and 24 months, retrospectively. The segmentation model showed very high correlation (r > 0.934) and agreement (mean difference < 116 mm3) in volumetric measurements with the reference segmentations. Comparison of our and manual segmentation methods yielded r = 0.845–0.973 and mean differences = 262–501 mm3 for weight‐bearing cartilage volume, and r = 0.770–0.962 and mean differences = 0.513–1.138 mm for subregional cartilage thickness. With regard to osteoarthritis progression, our method found most of the significant associations identified using the manual segmentation method, for both 12‐ and 24‐month subregional cartilage changes. The method may be effectively applied in osteoarthritis progression studies to extract cartilage‐related imaging biomarkers.


| INTRODUCTION
Articular cartilage is often studied in vivo using magnetic resonance imaging (MRI) to understand osteoarthritis-related soft tissue changes in knee joints. Several MRI protocols, particularly, based on 3D doubleecho steady-state (DESS) sequence, have been shown to provide a good contrast between the cartilage and the surrounding tissues. However, manual segmentation of cartilage is prone to resegmentation errors, particularly, due to inter-reader variability. 1,2 Furthermore, manual delineation is time-demanding, which makes it impractical in large-scale longitudinal studies, such as osteoarthritis progression research. Thus, the development of automatic methods for cartilage segmentation that are fast, accurate, and consistent is crucial. 3,4 Numerous studies have been performed to automate knee cartilage segmentation by applying image processing, conventional machine learning algorithms, 5 and registration-based methods, 6  Earlier works have shown associations between the changes of cartilage morphology, quantified via total volume and thickness statistics, and the progression of osteoarthritis. 10,11 Importantly, the morphological cartilage changes vary in relation to the particular knee anatomy and are heterogeneous, 12 such that cartilage thinning in one subregion may be accompanied by swelling in another. Moreover, even the healthy cartilage tissues have complex geometry and nonuniform thickness profiles. Assessment of the cartilage segmentation accuracy on a tissue level does not account for the relative importance of the subregional errors and, therefore, has limited clinical value for understanding the disease.
Multiple studies have been conducted to identify the cartilage subregions mostly affected by osteoarthritis. 13,14 Such subregions have been defined not only with respect to certain anatomical landmarks, but also considering the load dynamics within the joint. Several approaches have been proposed, with different levels of subregional granularity. 15,16 To date, the most established and validated method for subregional cartilage assessment is still semi-automatic, [16][17][18][19] and relies on the segmentation performed manually due to accuracy concerns.
In this study, we combined a DL-based cartilage segmentation with a nonrigid registration-based subregional division. Using this pipeline, we analyzed the subregional performance of the DL-based cartilage segmentation and investigated whether it could effectively substitute the manual delineation process in osteoarthritis progression research.

| Data
We performed a retrospective analysis using two data sets drawn from the Osteoarthritis Initiative (OAI; https://nda.nih.gov/oai), a multi-center prospective cohort study database (Table 1). Ethical approval and informed consent for all participants were obtained by OAI. The first data set (IMO) included 88 subjects from the OAI Progression cohort: sagittal 3D DESS knee MR scans from baseline and 12-month follow-up visits. The data set had manual annotations for femoral, tibial, patellar cartilage, and menisci produced by iMorphics Ltd. 20 The second data set (FBC) included 600 subjects from the FNIH Osteoarthritis Biomarkers Consortium data, 19 nested case-control study in OAI. Sagittal 3D DESS MR scans were obtained from the baseline, 12-month, and 24-month visits. Subjects overlapping with the training subset of IMO (n = 13), and subjects with missing assessments for any of the visits (n = 20) were excluded from FBC during the analysis. The data set comprised of four groups of subjects: controls (GC), radiographic progressors (GR), pain progressors (GP), radiographic and pain progressors (GRP). The demographics of the final data set and the groups are summarized in Table 1. We refer to Eckstein et al. 19 for the detailed definition of the progressor groups. In short, radiographic progression was defined as the loss of minimum joint space width of 0.7 mm or more from baseline to 24, 36,

| Segmentation
The proposed method performed segmentation of the cartilage tissues, their subsequent subregional division, and quantification of subregional morphological properties ( Figure 1). For segmentation, we employed a DL-based approach. Here, IMO data were split subject-wise into training (n = 70, KL1-4: 4/48/78/8 knees) and test (n = 18, KL1-4: 0/11/21/4 knees) subsets, maintaining the similar distribution of radiographic osteoarthritis severity, measured by the Kellgren-Lawrence (KL) system. The training subset was used in 5-fold cross-validation scheme to train five segmentation models, further used in a single ensemble. The models were trained to segment femoral, tibial, and patellar cartilage tissues, as well as menisci in 2D, from sagittal slices. During the evaluation phase, continuous model-wise outputs of softmax activation layers were averaged over the ensemble and the dominant class was identified for each voxel.
The architecture was based on UNet 21 with an ImageNet-pretrained VGG19 encoder, 22 and modified to have six levels in the decoder.
Further regularization was done using mixup, 23 which was previously shown to improve segmentation robustness in the task. 8

| Subregional division
The cartilage tissues were then divided into the subregions using a multi-atlas registration. We adopted two sets of atlases ( Figure 1).
The first multi-atlas (k = 5) was constructed according to Wirth and Eckstein. 16 Specifically, the femoral cartilage was split into trochlear,

| Morphological measurements
We performed the assessment both at tissue and subregional levels. The absolute volume was estimated using numerical integration over the mask voxels. The average thickness was estimated in two steps. First, the thickness map was computed from the tissue-level masks using our own implementation of the local thickness algorithm. 25 Ablation study of the thickness measure is provided in Supporting Information S1. Second, the average thickness was computed from the voxels corresponding to a particular tissue or a subregion.

| Comparison of the methods
We used the morphological measurements from FBC data produced by two independent groups. Chondrometrics GmbH provided volumetric and thickness measurements based on the delineation scheme of Wirth and Eckstein. 16 Cartilage segmentation in the method of Chondrometrics was done manually, followed by an automatic subregional splitting and assessment. Biomediq A/S provided fully automatic volumetric measurements obtained using registrationbased approach and the atlas of Hafezi-Nejad et al. 24 We compared the measurements of the methods using correlation and

| Tissue segmentation
The developed segmentation model showed volumetric and surfacebased accuracy comparable to the previously published state-of-theart methods. Segmentation quality metrics are presented in Table 2.
Tissue-wise, on the test subset of IMO data, the model achieved the T A B L E 2 Segmentation quality metrics and agreement measured between the results of our automatic method and the reference manual segmentations on OAI iMorphics data set

| Subregional division and morphological assessment
The results of the segmentation analysis are summarized in Table 2.
Visual inspection of multiple divided segmentation masks confirmed that the proposed division method produces the delineation in accordance with the original protocols of Wirth et al. 16 and Hafezi-Nejad et al. 24 In terms of DSC and ASSD, the least accurate seg-

| Comparison of morphological measurements and discriminative powers
The entire developed method, comprising the segmentation, subregional division, and morphological assessment modules, was eventually run on the FBC data set, and subsequently compared with the method of Chondrometrics 19 ( Comparison of our method to Biomediq 24 is summarized in Table 4  Relative performance of our and Chondrometrics methods was

| DISCUSSION
In this study, we presented a fully automatic method for subregional segmentation and morphological assessment of knee cartilage tissues. We validated and analyzed the performance of the method against the reference manual segmentations. Subsequently, we compared our method to two previously published solutions, one of which is an established semi-automatic system based on manual cartilage delineation with subsequent automatic subregional assessment. Our method achieved comparable discriminative power of radiographic osteoarthritis progression over 12 and 24 months as the semi-automatic system. The results show that the proposed method can already be used to automate the cartilage segmentation and subregional assessment in osteoarthritis research.
Our study brings an important novelty to knee MRI research. For the anatomical similarities between a scan and a multi-atlas, thus, closer mimicking the procedure followed by a human annotator.
Even though our method relies on the recent scientific advances, it has space for further improvements. The method yielded lower accuracy in posterior and trochlear cartilage subregions, which are typically of more complex curvature than the weight-bearing areas and are exposed to more apparent partial volume effect. 3D convolutional neural networks could be used to potentially improve segmentation in such areas, providing more accurate cartilage delineation with higher inter-slice consistency. However, it has been recently shown that 2D approaches may be equally as accurate in the problem on the same rather small data set, 29 which necessitates a larger and more diverse sample size to produce further improvements. Concerning the registration module, we did not touch on the optimal ways of creating the multi-atlas, which, if done right, can positively affect the accuracy of subregional splitting. It is important to note, however, that the creation of an optimal multi-atlas is an NP-hard problem and for example, OAI would require enormous amount of time and computation. 31,32 In this study, we simplified this process by selecting a diverse set of subjects with different age, sex, BMI, and KL-grades to account for the major variability factors in the cohort. Using more advanced voting strategies over the multi-atlas may provide further improvements. 33 Finally, recently introduced pre-trained DL-based registration methods can be alternatively used to improve the accuracy of subregional division. 34 Accuracy-wise, our method showed reduced performance in the medial compartment of femoral and tibial cartilage tissues as compared with the lateral one.
This difference could be potentially attributed to the more complex morphology of the degraded cartilage, noisier reference segmentations in such areas, and higher contribution of the partial volume effect. Analysis of the segmentation model and the complete pipeline at different KL-grades also indicated that our method is less accurate toward severe osteoarthritic cases. Improving the method performance in such cases, particularly, for KL4 almost absent in the considered data sets, would be a valuable direction for further work.
Despite the methodological novelty, and good performance of the proposed method, our work has still some limitations. In general, the validation of algorithms for segmentation and morphological assessment of the cartilage tissues from MRI is a long-standing problem.
Previous studies suggested multiple approaches for validation of measurements in vivo, including computed tomography arthrography, stereophotogrammetry, radiography, ultrasonography. [35][36][37][38][39][40] Reference manual segmentation still remains the most accurate and widely used technique, despite the accompanying intra-and inter-user variability and MR contrast-related bias. 29,30 In this study, we extensively used  Chondrometrics. However, when assessed on the reference segmentation masks on IMO data, our approach showed small AMD, both in the volumetric and thickness measurements. This finding suggests a presence of a systematic bias between the reference segmentations from IMO data and the manual segmentations of Chondrometrics.
Subsequently, a discrepancy of approximately 2-3 voxels was observed between our and Chondrometrics' measurements of subregional thicknesses. Here, the differences in segmentations were presumably further amplified by the post-processing done solely in Chondrometrics' method (e.g. meshing) and the different thickness measures used. For instance, Maier et al. 41 showed that local thickness used in our method may relatively overestimate cartilage thickness compared with distance transform. We believe that already by using manual segmentation masks of Chondrometrics, our method can be trained to produce less biased measurements and achieve better agreement. A more thorough comparison to the method of Chondrometrics is prohibitively complicated by its proprietary nature. Another important difference is in the compared morphological features.
Namely, the thickness measurements by Chondrometrics were computed over the total area of subchondral bone, taking into computation also the denuded bone regions. Very sparse measurements were provided specifically over the cartilaginous areas, which were the main target of our method. Presumably, this has negatively affected the agreement between the methods and the final ORs. This is also suggested by the higher similarity of morphological measurements and ORs between our and Chondrometrics methods with features computed over cartilaginous areas. Nonetheless, our approach can be extended by incorporating bone segmentation into the pipeline, to produce similar thickness estimates.
In our study, we used average subregional thickness as the key morphological feature. It is necessary to mention the work of Buck et al., 18 where more complex thickness statistics were shown to yield larger effect sizes, at least, in osteoarthritis treatment efficacy studies. Next, even though our study sheds light on the properties of DL-based segmentation, it does not touch on the important aspects of resegmentation errors 30 and errors due to scanner drifts, 42  the studies on new MRI protocols often include conventional sequences, correspondence of exam scans can be exploited to facilitate the adaptation of our method. We leave these topics to further work.
Our study makes an important step toward automating the morphological analysis of knee cartilage tissues. We introduced a fully automatic method for segmentation and sub-regional cartilage assessment from 3D DESS MR images. By studying the methodological properties of our approach and comparing it to the prior automatic and established manual methods, we provide new insights on the maturity of automatic segmentation methods for osteoarthritis progression studies. The source code of our approach including the local thickness implementation is publicly available at https://github. Pfizer, Inc. Private sector funding for the OAI is managed by the Foundation for the National Institutes of Health. This manuscript was prepared using an OAI public use data set and does not necessarily reflect the opinions or views of the OAI investigators, the NIH, or the private funding partners.

CONFLICT OF INTERESTS
The authors declare that there are no conflict of interests.