Development of novel machine learning model for right ventricular quantification on echocardiography—A multimodality validation study

Abstract Purpose Echocardiography (echo) is widely used for right ventricular (RV) assessment. Current techniques for RV evaluation require additional imaging and manual analysis; machine learning (ML) approaches have the potential to provide efficient, fully automated quantification of RV function. Methods An automated ML model was developed to track the tricuspid annulus on echo using a convolutional neural network approach. The model was trained using 7791 image frames, and automated linear and circumferential indices quantifying annular displacement were generated. Automated indices were compared to an independent reference of cardiac magnetic resonance (CMR) defined RV dysfunction (RVEF < 50%). Results A total of 101 patients prospectively underwent echo and CMR: Fully automated annular tracking was uniformly successful; analyses entailed minimal processing time (<1 second for all) and no user editing. Findings demonstrate all automated annular shortening indices to be lower among patients with CMR‐quantified RV dysfunction (all P < .001). Magnitude of ML annular displacement decreased stepwise in relation to population‐based tertiles of TAPSE, with similar results when ML analyses were localized to the septal or lateral annulus (all P ≤ .001). Automated segmentation techniques provided good diagnostic performance (AUC 0.69–0.73) in relation to CMR reference and compared to conventional RV indices (TAPSE and S′) with high negative predictive value (NPV 84%–87% vs 83%–88%). Reproducibility was higher for ML algorithm as compared to manual segmentation with zero inter‐ and intra‐observer variability and ICC 1.0 (manual ICC: 0.87–0.91). Conclusions This study provides an initial validation of a deep learning system for RV assessment using automated tracking of the tricuspid annulus.


| INTRODUC TI ON
Right ventricular (RV) dysfunction is a well-established prognostic marker for a wide range of conditions including pulmonary hypertension, cardiomyopathy, and congenital heart disease. [1][2][3] Echo is the most widely used screening tool to assess the RV with methodologies studied in relation to reference standards and for prediction of cardiovascular outcomes. 4,5 However, challenges for RV functional assessment by echo are well documented primarily due to complex RV geometry. 6,7 Moreover, current 2D methodologies for RV assessment require additional M-mode and tissue Doppler velocity image acquisition and analysis for which accuracy is dependent on an on-axis cursor placement in the direction of tricuspid annular displacement.
Machine learning (ML)-based methodologies have the potential to provide fully automated image analysis. Conventional neural networks-based segmentation techniques have been applied to echo, though with focus primarily on left ventricular chamber size and systolic function quantification. 8 While a recent study examined a ML approach for three-dimensional echocardiography (3DE) assessment of RV volume and EF, 9 limited clinical availability of 3DE is a known barrier for widespread utilization. Fully automated ML approaches have yet to be applied for RV assessment on standard 2D echo and have the potential to improve efficiency and accuracy without need for additional M-mode, tissue velocity, or 3D image acquisition.
This study examined RV functional assessment using a novel MLderived fully automated approach for RV quantification on routine 2D echo. The primary study aim was to determine the feasibility and reproducibility of automated ML algorithm for cardiac magnetic resonance (CMR) quantified RV dysfunction among a prospectively enrolled cohort of patients undergoing echo and CMR.

| Study population
The study population comprised prospectively enrolled patients with known or suspected coronary artery disease (CAD) between September 2015 and December 2018 in a multimodality imaging protocol focused on cardiac chamber remodeling. Patients underwent echo and CMR within a narrow interval (99% the same day).
In all patients, comprehensive demographic data were collected using standardized questionnaires, including cardiac risk factors and medications. This study was conducted with approval of the Weill Cornell Medical College Institutional Review Board, which was in compliance with the Declaration of Helsinki; written informed consent was obtained at time of enrollment.

| Imaging protocol
Echo and CMR were performed using a standardized image acquisition protocol:
Cardiac chamber volumes were assessed via cine-CMR (steady-state free precession), which included long-axis 2-4 as well as contiguous short-axis slices acquired from the tricuspid valve annulus through the RV apex that were quantified at end-diastole and end-systole for calculation of RV ejection fraction (RVEF). CMR RV DYS was defined as RVEF < 50%.

| Echocardiography
Transthoracic echoes were acquired using commercial equipment (Philips iE33). Echoes were interpreted by experienced investigators within a high-volume laboratory for which expertise and reproducibility for quantitative RV indices have been previously reported. 5 RV function was quantified using tricuspid annular plane systolic excursion (TAPSE) and RV systolic excursion velocity (S′). Measurements were acquired in accordance with American Society of Echocardiography (ASE) guidelines; established cutoffs (TAPSE < 1.6 cm, S′ < 9.5 cm/s) were used to define RV DYS by each parameter. 10 Echo analyses were performed blinded to CMR results.

| Image processing
Manual segmentation maps were created by annotating the free-wall

| Model and training
The automated segmentation model was based on neural network architecture described by Han, 11 a modified U-net, 12 for which excellent performance has been previously demonstrated in medical segmentation. The model makes use of residual modules, 13 which improve gradient flow between adjacent layers and increase classification accuracy. A diagram of the model's architecture is shown in Figure 1.
The ML algorithm was initially trained and tested using sixfold cross-validation. Cross-validation is a procedure whereby data are randomly split into nonoverlapping subsets such that a model can be trained on all but one subset and tested on the remaining subset. In this case, a different model instance was trained and tested for each of the six holdout subsets and test metrics were averaged per-case for the entire dataset. As is typical of cross-validation, no model instance was tested on data from which it was trained. Cross-validation was chosen in place of splitting into solitary train, validation, and test subsets because it better demonstrates the true performance of a model under a size-constrained dataset. To minimize the risk of overfitting, neural network architecture, hyperparameters, cross-validation groupings, and training protocols were not modified in any way after the model was exposed to the cross-validation dataset.
Modification of these parameters could result in improved measures of accuracy but this could be at the expense of generalizability.
A weighted softmax/cross-entropy loss function was used for training as follows: where x is the output logit vector at a given pixel, i the true class label, w the vector of class weights, and C the number of classes. Weighting was employed to combat class imbalance given that the vast majority of pixels in each image were nonannular. A class weight of 0.2 was empirically assigned to the nonannular class and 0.8 to the annular class.
RMSProp was used to apply incremental parameter updates.
The following automated and manual indices were derived using a segmentation map: (a) linear tricuspid annular displacement (LTAD), Software code pertaining to both training and testing of the ML model can be found on line at:https://github.com/akbra tt/RVTra cker.

| Model performance
The model was evaluated by comparing values of maximal displacement obtained from the automated segmentation compared to the manual segmentation. Measurements obtained from automated segmentation maps were compared to standard echo indices and RVEF on CMR, defined as the reference standard of RV functional assessment.

| Statistical methods
Comparisons between groups were made using Student's t test (expressed as mean ± SD) for continuous variables. Inter-and intra-observer agreement between methods was assessed using the method of Bland and Altman, 14 which yielded mean difference as well as limits of agreement between methods (mean ± 1.96 SD).
Bivariate correlation coefficients, intra-class correlation coefficients, and linear regression equations were used to evaluate associations between variables. Statistical calculations were performed using SPSS 24.0 (Statistical Package for the Social Sciences, International Business Machines, Inc), SciPy, 15 and Excel (Microsoft Inc). Twosided P < .05 was considered indicative of statistical significance.

| Clinical application
Tricuspid annular shortening indices for RV functional assessment via manual and automated ML segmentation were tested in 101 patients equating to 7791 frames, among whom nearly one third (31%) had RV dysfunction (RVEF < 50%) as defined by CMR reference standard. Table 1

| Comparison between manual and automated machine learning segmentation
When comparing manual and automated quantification of annular segmentation, all displacement correlations were good (r = .61-.82) with reasonable limit of agreement for both (−1.09 to 1.39 and −5.31 to 5.50, respectively). Scatter plot and Bland-Altman analyses using ML-derived annular tracking in relation to manual quantification are shown in Figure 3.
To assess reproducibility, manual and automated lateral LTAD and CTAD quantification were performed in 22% of random subset of studies (Table 3). Reproducibility was, unsurprisingly, high for ML algorithm with zero inter-and intra-observer variability and intra-class correlation coefficient of 1.0. Inter-and intra-observer reproducibility of manual segmentation was lower as compared to

| Diagnostic performance for RV function
As shown in Figure 4, both LTAD and CTAD decreased stepwise in relation to population-based tertiles of TAPSE, with similar results when ML analyses were localized to the septal or lateral tricuspid annulus (all P ≤ .001). Figure 5 demonstrates RV annular segmentation techniques developed in this study to yield good diagnostic performance for discriminating RV dysfunction defined by CMR.    More recently, ML-based 3D echo algorithm to quantify RV volumes and RV ejection fraction was tested in a retrospective cohort. 9 Although quantification was feasible in all fifty-six patients, the automatic approach was only accurate in 32% of the study population. Endocardial contour editing was necessary in the remaining 68% of patients and resulted in a sevenfold increase in processing time. In addition, while 3D echo is an excellent RV quantification methodology, it should be noted that obtaining optimal 3D echo image acquisition can be challenging and time-consuming.

| D ISCUSS I ON
As such, 2D echo is the most widely used screening tool to assess RV structure and function. In this regard, our findings support that it is possible to successfully automate assessment of RV function on conventional 2D echo, creating a robust and readily available solution with its application particularly attractive for large-scale population-based studies.
Our findings should be noted in the context of the following limitations. The study population included 101 CAD patients from a single institution, and although automated measurements were reliable and comparable to manual measurements, they did not provide substantially higher diagnostic utility. It is important to note that while equivalent views for conventional and aML segmentation were utilized whenever available for measurement, it is possible that slight variations in transducer angulation and resultant views could have yielded differences in displacement values between conventional aML segmentation. It is also possible that visualization of cardiac structure and function itself could have led to reader bias when in assessing TAPSE and S′.
Such bias could contribute to higher diagnostic performance of conventional versus automated measurements. In this context, it is also important to note that TAPSE itself is not without limitation, as suboptimal placement of the M-mode cursor can result in angle-dependent inaccuracy of RV function. Automated annular segmentation has the potential to overcome this limitation.
These concepts need further testing within the framework of a larger population with wider range of RV function and further training, which itself has the potential to improve diagnostic performance. Future machine learning techniques could also include an ensemble of models evaluating several parameters (eg, annular displacement, strain, TAPSE, S′) as opposed to a single measurement, which has the potential to further improve its diagnostic performance.
In conclusion, fully automated tricuspid annular displacement from a novel deep learning model performs well in relation to manual echo indices for the detection of CMR-evidenced RV dysfunction.
This study adds to the growing literature that ML-based algorithms can improve image interpretation efficiency and reliability and is the first of its kind to systematically test and validate ML-derived 2D RV indices. Further research is warranted to test diagnostic and prognostic utility of ML-derived tricuspid annular displacement in large population-based cohorts.

ACK N OWLED G M ENTS
None.

CO N FLI C T S O F I NTE R E S T S
None.