Automated assessment of human engineered heart tissues using deep learning and template matching for segmentation and tracking

Abstract The high rate of drug withdrawal from the market due to cardiovascular toxicity or lack of efficacy, the economic burden, and extremely long time before a compound reaches the market, have increased the relevance of human in vitro models like human (patient‐derived) pluripotent stem cell (hPSC)‐derived engineered heart tissues (EHTs) for the evaluation of the efficacy and toxicity of compounds at the early phase in the drug development pipeline. Consequently, the EHT contractile properties are highly relevant parameters for the analysis of cardiotoxicity, disease phenotype, and longitudinal measurements of cardiac function over time. In this study, we developed and validated the software HAARTA (Highly Accurate, Automatic and Robust Tracking Algorithm), which automatically analyzes contractile properties of EHTs by segmenting and tracking brightfield videos, using deep learning and template matching with sub‐pixel precision. We demonstrate the robustness, accuracy, and computational efficiency of the software by comparing it to the state‐of‐the‐art method (MUSCLEMOTION), and by testing it with a data set of EHTs from three different hPSC lines. HAARTA will facilitate standardized analysis of contractile properties of EHTs, which will be beneficial for in vitro drug screening and longitudinal measurements of cardiac function.

the market, only very few offer important clinical advantages for the patients. 4 Cardiotoxicity is what claims the highest incident of adverse drug reaction during preclinical or clinical development and postapproval stage. 5,6 As a consequence, approximately one-third of all drugs are withdrawn from the market. 5,7,8 One of the reasons for this low success rate is the use of animal models as a preclinical method to evaluate efficacy and predict cardiotoxicity, which lacks high throughput and is expensive. Furthermore, animal models do not fully recapitulate human physiology and disease patho-physiology. 9 Currently, cardiomyocytes (CMs) can be obtained by differentiation of human pluripotent stem cells (hPSCs). 10,11 It has been previously shown that three-dimensional (3D) hPSC-derived cardiac models have several advantages over two-dimensional (2D) cardiac in vitro models, because of their higher resemblance of the in vivo situation, based on improved molecular, morphological, and functional analysis. Standardized and automated analysis of cardiac function in 3D cardiac in vitro models will facilitate and expedite drug discovery and toxicity screening. 2,[12][13][14] Using either control or patient-derived hPSCs, 3D engineered heart tissues (EHTs) have been frequently used for studying cardiac function. In addition, EHTs are suitable for evaluation of pharmacodynamics and pharmacokinetics during the process of drug development. 12,[15][16][17] We have previously developed a platform that allows to produce EHTs in a 12 well-plate format. 18 In this platform, the EHTs from hPSC-CMs are formed around two cantilevers and there is a set of three pairs of cantilevers per well. Like this platform, multiple platforms use image analysis as a way to measure the contractility of the EHTs. 12,15,[19][20][21] The contractile properties of EHTs are relevant parameters for the analysis of disease phenotype, cardiotoxicity, and longitudinal measurements of cardiac function over time. Currently, there is a lack of a robust software that can analyze the contractile performance in a reliable and high throughput manner. One of the reference software for contractile analysis is MUSCLEMOTION. 22 It uses dynamic changes in pixel intensity between video frames and creates a relative measure of movement as output, which means that there are no absolute values of force, making it difficult to compare functional properties between tissues. Besides, the accuracy of this method suffers from distortions, problems in case of background noise, sample shifts, vibrations, and susceptibility to the presence of non-desired objects (e.g., debris) in the videos. These problems limit the algorithm robustness and the usability on a large data set. Meanwhile, the field of computer vision has evolved rapidly, with the development of methods based on deep (neural network) learning and template matching, which are used to speed up and increase the accuracy of analyzing biological samples. [23][24][25][26][27] Deep learning algorithms are used, for instance, in image segmentation methods to classify each pixel of an image belonging to one of the regions of interest. State-of-the-art approaches use convolutional networks that are trained using labeled examples. 28 Template matching is a method for finding a region similar to a given template object in a source image. This is done by comparing a template image T to every possible region in a source image I using a sliding window approach. 29 We hypothesize that by developing an algorithm that combines the methods of computer vision like deep learning and template matching for analysis of contractile parameter in EHTs, we will overcome the current limitations of the state-of-the-art image analysis in MUSCLEMOTION, related to relative output values, the sample shifts, background noise, and susceptibility to the presence of artifacts. This will increase the accuracy of measurement and facilitate the automatic analysis of large data sets.
In this study, we developed HAARTA (Highly Accurate, Automatic and Robust Tracking Algorithm), a software tool that automatically analyzes contractile proprieties of EHTs by segmenting and tracking brightfield videos, using deep learning and template matching with sub-pixel precision. We evaluated the software with a data set of hPSC-derived EHTs brightfield videos and use it in following the treatment of the positive inotropic agent isoproterenol. We demonstrate improved robustness and accuracy of the newly developed software HAARTA by comparing it with MUSCLEMOTION, leading to standardized analysis of contractile properties of EHTs, which will be beneficial for in vitro drug screening and longitudinal measurements of cardiac function.

| Overview
We developed HAARTA, a software that facilitates the automatic assessment of contractile properties of EHTs, which integrates deep learning and template matching techniques. HAARTA initially segments a brightfield frame of an EHT video using a pre-trained Dee-plabv3 with a ResNet-101 30 as convolutional backbone (Figure 1a).
The algorithm segments the frame in four regions and identifies the 3D cardiac tissue, the outer pillar where the tissue is anchored to, the inner part of the pillar, and the background (Figure 1b). The algorithm uses first the segmented region of the tissue to calculate the surface area. Subsequently, the outer pillar region is used as a template for tracking the position of the pillars during tissue contraction ( Figure 1c). We considerably increased the accuracy of the tracking by template matching with sub-pixel precision, which allows to detect movements below pixel resolution. The trajectory result of the tracking is used to calculate maximum and minimum contraction, contraction kinetics, and the times that takes to achieve 10% and 90% of contraction and relaxation (Figure 1d). Graphs and raw data are located in a "Results" folder together with a summary file (Figure 1e).
The key to our method is the combination of an accurate segmentation step, which allows having the right template to use in the tracking by template matching with sub-pixel precision, which results in automatic, robust, and standardized tracking of EHT datasets.

| Tracking by template matching with sub-pixel precision
For contractile assessment of EHTs, tracking the centroid of the anchored points by using a thresholding algorithm is the most common technic in multiple platforms [31][32][33][34] (Table S1). In the case of the pillars provided by River BioMedics, the variability in the bottom shape and the low level of contrast in the brightfield videos, combined with background noise, make it challenging to use that approach ( Figure S1). In order to overcome this limitation and obtain more robust and accurate tracking results, template matching was implemented. Initially, the regions of interest in an EHT video were identified. Those regions are defined as the background, 3D cardiac tissue, and the pillars. Specifically, the pillars were divided in outer pillar and inner pillar (Figure 2a). As a result of the variability observed in the shape of the inner pillar ( Figure S1A), the outer region was initially chosen as template and manually segmented on the first frame successfully ( Figure 2b). To have significant reduction of computational costs during the tracking by template matching, we took into account what has been shown previously about the behavior of 3D cardiac tissue during contraction, to define a region within the frame to look for displacement of the pillar 17,34,35 (Figure 2c). Therefore, the template matching was computed only on those regions. As a result we obtained an accumulator matrix, wherein the brighter the pixel is, the better the match (Figure 2d). The contraction trajectory created by calculating the distances between the brighter points detected on the left and right pillars showed a square contraction wave (Figure 2e) (Equations 4 and 5). It is worthy to note that this is not the expected smooth contraction wave according to literature. [35][36][37] Therefore, we designed a tracking method based on sub-pixel precision matrix. We interpolate in between pixels of the accumulator matrix, specifically in the region of maximum intensity. We observed the result of the template matching and the corresponding 3D projection ( Figure 2f). This resulted in a perfect match, shown by the peak in the 3D projection. However, this is not always the case, since occasionally the match lies between two pixels ( Figure 2g). This is shown by neighboring pixels that have a match score close to the maximum score. To find the perfect match in this case we used interpolation and identified the match point of the template with sub-pixel precision ( Figure 2h). We used this method to identify the center of the pillars along the frames and obtained a smooth contraction wave ( Figure 2i).

| Segmentation by deep learning
For segmentation, deep learning approaches have been previously reported as valuable tool to identify image regions belonging to different objects of interest. [38][39][40] In this study, we compared two convolutional neural networks (CNNs), namely Deeplabv3 with a ResNet-101 backbone 30 and U-net. 40 To train the networks, we manually annotated ground truth segmentation maps of 81 EHT brightfield videos ( Figure S2). We used the annotated data to train and test the considered CNNs, and found that both approaches perform well in segmentation tasks of EHT regions. Nevertheless, U-net showed slightly higher performance to identify the tissue, outer, and inner pillar (Table 1 and Towards an automatic assessment of the contractility of EHTs and eliminating the manual selection of the template described above, we evaluated the use of the two segmentation CNNs to provide input templates for the template matching algorithm. Accordingly, we identified that, regardless of the differences in trajectory baseline, the computed trajectories using automatically extracted templates are very similar. Overall, selecting the outer pillar templates contributes to having a smoother trajectory than the inner pillar template trajectories ( Figure 3b). template matching method exactly on pixel matched, and its corresponding 3D projection. (g) Result peak region image of the template matching method where the match lays between pixels and its corresponding 3D projection. (h) Result peak region image by the template matching algorithm with annotated sub-pixel precision match and its corresponding interpolated 3D projection. (i) Trajectory of an EHT tracked by template matching with sub-pixel precision. EHT, engineered heart tissue.

| Validation of the template matching with sub-pixel precision using a simulation video
To assess the performance of the tracking and segmentation, we made simulation videos where we control the background noise and the frequency of contraction ( Figure S4). We evaluated the accuracy of the tracking by increasing the noise at different frequencies. We observed that the algorithm is stable until 25% of added noise. At that level, the error in pixels is below 0.2 approximately. With low levels of noise, the algorithm performs with a precision lower than 0.1 pixel ( Figure 4).
As a complementary step, we qualitatively evaluated the results of the tracking on a scale from 1 to 5, where 5 is the best score and 1 is the lowest. The evaluation showed that 98% (79 videos) scored 5 on correct tracking, followed by 1% that scored 4 or 3. In all the videos the trajectory was correct, the 2% below the 5 score were due to a misalignment of the points printed on top of the frame with the pillars.

| Comparison of the tracking accuracies of our method and MUSCLEMOTION
To further assess the potential of HAARTA, we compared the tracking performance with MUSCLEMOTION. We found that in multiple occasions our method outperformed the tracking accuracy of MUSCLEMOTION, and that only in situations where the displacement was significant (visible to the eye) and the background noise low, T A B L E 1 Intersection over Union (IoU) results of the Deeplabv3 with a ResNet-101 backbone and U-net prediction of the background, inner pillar, outer pillar, and tissue classes.   Figure S5). Then, we compared the performance of both software with the ground truth and in the presence of different background noise levels.
HAARTA consistently achieved higher accuracy in generating the contraction trajectories. In the case without background noise, the contraction trajectory computed by our algorithm was identical to the ground truth, while with MUSCLEMOTION some alterations in the trajectory were observed (Figure 5c,d and Figure S6).

| Computational speed
The computation speed of the algorithms is measured. In our case, the computation speed of the proposed methods are measured on a highend desktop and a high-end laptop ( Table 2).

| Contractile performance of human EHTs using a commercially available EHT platform
We tested the performance of the algorithm by analyzing the contractility of hPSC-derived EHTs made using the 12-well-plate platform in the relative force at 0.01 μM in all the conditions followed by a plateau phase. Also, at the highest concentration (10 μM) out of the normal physiological values, the software was able to analyze the EHTs contractility accurately without any problem. We observed a decrease in the relative force of the tissues at this concentration ( Figure 6h).  (Figure 3b). However, as shown in Figure S1A, there is a high fabrication variability in the internal shape of the pillar while the outer shape is more regular. Thus, by using the Deeplabv3 to identify  40 Nevertheless, for this application of segmenting videos of 3D cardiac tissues with the pillars and using it as an input for template matching, we found that Deeplabv3 achieves better and more stable results.

| DISCUSSION
There are several custom-made algorithms that are used by other research groups (Table S1) to analyze contractility of cardiac tissues.
In general, the thresholding technic and edge detection are the most used to identify and track displacement of pillars in brightfield videos. 20,31,33,34 However, this requires constant optimization and is susceptible to changes of illumination, background noise, and artifacts (e.g., debris), which is a problem for high-throughput analysis. In the particular case of EHT technologies, 41  Nevertheless, a great addition to our algorithm would be to measure the level of noise present in a video before the analysis. This will improve the reliability of our algorithm by estimating the precision of the output trajectory with the found results of the simulation. In situations where a very low displacement is identified (in case of diseased or severely affected cardiac tissues), a low pass filter on the trajectories could be applied to avoid an increase in the relative error.
We evaluated the performance of the algorithm by analyzing the hPSC-derived EHTs made using the 12-well plate platform provided by River BioMedics with three different cell lines. We observed that the contractile force in all three cases is higher compared with previous results using polydimethylsiloxane (PDMS) pillars. 35 This is related to the significant increase in the stiffness of the commercial pillars compared to PDMS, which creates a higher load to the tissues. As previously reported, the CMs increase the contractile performance when they are under higher mechanical restriction. 43  In summary, we have demonstrated the advantages of HAARTA to analyze contractile properties of 3D cardiac tissues by using a new approach that combines segmentation and template matching with sub-pixel precision, under different conditions. Furthermore, our algorithm successfully tracks at the sub-pixel level with a 0.2 pixel precision up to 25% of added noise and with a 0.1 pixel precision up to 20% of added noise. We believe that the logic of the template matching with sub-pixel precision will be very advantageous for the automatic and standardized analysis of other (commercial) platforms that are assessing contractile performance of engineered heart or muscle tissues.

| CONCLUSION
To conclude, we present HAARTA, a robust, accurate, and computationally efficient algorithm for analyzing hallmark physiological features of brightfield videos of 3D cardiac tissues. After quantitative and qualitative evaluation, the software has shown to offer an advantage to the current methods to measure contractility. Moreover, with the standardization and miniaturization of contractile tissues (either cardiac or skeletal muscle), which can be generated from patientderived stem cells, the use of software for automatic and accurate functional analysis will be important for processing big data sets and will facilitate disease modeling and safety pharmacology and expedite drug discovery.

| Template matching
Template matching was used to track the position over time of the pillars by comparing the template image (T) of each set of two pillars extracted in the first frame through the stack of frames (source images (I)). This was done by comparing the pixel intensities with the normalized correlation coefficient (NCC). 29 The corresponding formulas for NCC matching are found in Equation (1), Equation (2), and Equation (3). NCC computes an accumulator matrix R. Each value in the accumulator matrix represents a match score of the location in the source image where one indicates an exact match, and zero indicates no match at all. The location of the best match was found by determining the highest score in the matrix.
From the original frame, a region of interest was used to initialize the template matching algorithm for each pillar, taking into account the biological behavior of the cardiac tissues described in literature ( Figure 2c). The brighter pixel from the accumulator matrix of the left and right matched pillar templates were used to calculate pixel distance between the points according to Equation (4). In this way, we tracked the distance of the pillar centroids frame-by-frame. Later, that distance was converted from pixel to millimeters (mm) and subtracted from the initial distance between the pillars to calculate the displacement per frame (Equation 5).
where a is the conversion factor from pixels to millimeters and 3.2 mm, the initial distance between the pillars.

| Tracking using sub-pixel precision
We used the output of the NCC accumulator matrix to identify with higher precision the displacement of the pillars as the movement is below pixel level resolution. We zoomed the region of the maximum intensity to find a match with sub-pixel precision by using spatial interpolation methods. Using a polynomial function, we estimated the values between pixels and used the pixel with higher match score. Bicubic spline is a common technique for obtaining smoothness in two-dimensional interpolation. 45 By using the function output of the Bicubic spline we got a match score for a (decimal) location in the peak region and beyond. To find the highest match score we used the local maximum of the function by using the Nelder-Mead algorithm. 46 The Nelder-Mead algorithm is one of the best-known algorithms for multidimensional unconstrained optimization without derivatives. The location of the maximum match score at pixel level is already known and was the starting point to find the local maximum of the function. The initial point is within half a pixel distance of the maximum match score, and therefore, the computation of the Nelder-Mead algorithm is limited together with a threshold (we set it to 0.0001 pixel). Using the output of this method the trajectory of the contraction at sub-pixel precision was determined.

| Verification step
To verify the performance of the tracking, simulation videos were made.
5.4 | Segmentation using deep learning

| Data set
In this study, a data set of manually segmented EHT frames was created to provide a CNN with ground truth data of the EHT brightfield's. A set of 65 videos of approximately 500 brightfield frames per video was used to create the data set. Four frames per video were manually segmented, those frames were at the positions 0, 150, 300, and 450. The dataset has 260 brightfield images with their corresponding pixel-wise labels. The labels consist of RGB images where white is the background, red is the inner pillar, green is the outer pillar, and blue is the tissue itself ( Figure S2). The data was resized to the same dimensions before it was fed to the CNN. First, height and width are scaled to maintain the aspect ratio. Then padding is used to fill up space to the correct dimensions. The resized images have the dimension of 738 Â 266 pixels.

| Training
A CNN was set up to segment the regions of the EHTs, using as input of the CNN a brightfield frame of an EHT video. Later, the mask matrices were generated by the shapes found using CNN. Then, the CNN is fed with frames of the EHT videos where the correct segmented output is known (labeled). We compared the output of the CNN with the known label by a cost function. Using the outcome of the comparison the CNN parameters were adjusted. We tested the precision of the CNN with a so-called test-set. The test-set contains labeled frames that are new to the CNN, otherwise this will highly influence the outcome score. A score was given for precision by comparing the outputs of the CNN with the labels. We compared two CNN approaches for the segmentation of the EHT video frames, U-net 40 and Deeplabv3 with a ResNet-101 backbone. 30 The training of the U-net model was done using the settings in Table 3. The input layer of the model was adjusted from a three-layer RGB input to a one-layer grayscale input since the input is in grayscale.
For the training of the Deeplab with a ResNet-101 backbone, we used the settings in Table 3. The input layer of the model is in three-layer RGB format. Since the model uses the three layers further up in the model, the input layer cannot be changed directly. Therefore, the grayscale input was converted to an RGB image before it was fed to the model.

| Calculation the 3D cardiac tissue surface area
We calculated the surface area of the 3D cardiac tissues by calculating the surface of one pixel and multiplying it by the sum of all activated pixels in the segmented mask ( Figure S2C), which contains all the pixels of the tissue. In Equation (8) the formula for calculating the surface area is shown.
where a is the conversion factor from pixels to millimeters.

| Segmentation by a deep learning model as input for template matching
The proposed template matching method (Section 5.2.1) requires as an input a template of the two pillars to track the displacement trajectory during tissue contraction. Previously, manual segmentations were described. However, we incorporated the results of the deep learning segmentation models as an input to the template matching and tracking algorithm. We performed a comprehensive comparison between manual segmentation, segmentation by U-net, and segmentation by Deeplabv3 with a ResNet-101 backbone.
Using the mask of the outer pillar (Figure 3b), we made a square around it from the minimum and maximum bounds of its shape to create a template. To avoid any unwanted error by blobs in the segmentation, the two largest shapes in the segmentation were selected to initialize the templates.

| Visual inspection of tracking results
We checked the tracking and trajectory results of the algorithm by qualitatively scoring 81 videos with a grade from one to five. In the grading scale, one indicates that the tracking was not correct and as a consequence the trajectory is completely different from the expected one, while five indicates that the algorithm was able to track the center of the pillars precisely, resulting in computing a contraction wave as expected. We used the results overview of the user interface to evaluate each video.

| Comparison of tracking accuracy between our method and MUSCLEMOTION
We used the simulation videos (described in Section 5.3) to compare the performance results achieved by MUSCLEMOTION and by our tracking algorithm. In the experiments, we considered increasing image noise levels from 1% to 25%, in order to simulate acquisition induced noise by the sensors and evaluate the algorithms in more challenging conditions. The results of both algorithms were compared to the ground truth contraction signal.

| Generation of EHT
The EHTs were made as previously described by Ribeiro

| Mechanical characterization
The mechanical characterization of the River BioMedics commercial pillars was done according to Dostani c et al. 55 Shortly, the force of

| Image-based contractility measurement
The contractility of the EHTs in the commercial 12-well plate was measured as previously described by Ribeiro et al. 35 Briefly, the brightfield videos of EHTs were recorded using a Nikon ECLIPSE Ti2 inverted microscope (RRID:SCR_021068) under temperature and humidity control (37 C and 5% CO 2 ), using a high-speed camera at 100 frames per second (fps) with 2Â magnification. The output file was in ND2 format. Force of contraction was first measured at day 5 (D5) after tissue seeding and for a period of 30 days, every 5 days (day 10 (D10), 15 (D15), 20 (D20), 25 (D25), and 30 (D30)). During the force of contraction measurements, the EHTs were electrically stimulated using a custom-made pacing device at 1 Hz (10 ms biphasic pulses, 4-5 V/cm) for 10 s ( Figure S8B).

| Compound testing
At day 30 (D30) the EHTs were assessed for a positive inotropic response to isoproterenol (Sigma, I5627). The test was conducted under temperature, humidity control (37 C and 5% CO 2 ), and field stimulation. The positive inotropic effect was determined with increasing concentrations of isoproterenol (0-10 μM). After 5 min of each dose administration, the tissues were recorded.

| Statistics
Statistical analysis was performed using GraphPad Prism 8. Each experiment was performed three times, with CMs from independent differentiations. Per experiment, each set of three tissues (one well of a 12-well plate) was considered as technical replicates. Differences between groups were assessed by two-way ANOVA plus Tukey's post-hoc test. Results are displayed as mean ± SD unless stated otherwise. Significance was attributed to comparisons with values of p < 0.05¥; p < 0.01 †; p < 0.001 ‡; p < 0.0001#.

| Python package
A python package was created for tracking with sub-pixel precision to provide scientists with an easy-to-use library. This library does not contain the segmentation part since the segmentation relies on a specifically trained model, which will not work on slightly different images. Instead, the scientists can provide their own templates. The python package can be found on www.github.com/dkeekstra/ sptemplatematching.

PEER REVIEW
The peer review history for this article is available at https://www. webofscience.com/api/gateway/wos/peer-review/10.1002/btm2.

DATA AVAILABILITY STATEMENT
The data that supports the findings of this study are available in the supplementary material of this article and in the link www.github. com/dkeekstra/sptemplatematching.