Guidewire Endpoint Detection Based on Pixel‐Adjacent Relation during Robot‐Assisted Intravascular Catheterization: In Vivo Mammalian Models

Existing surgical guidewire endpoint localization methods in X‐ray images face challenges owing to their small size, simple appearance, nonrigid nature of objects, low signal‐to‐noise ratio of X‐ray images, and imbalance between the number of guidewire and background pixels, which lead to errors in surgical navigation. An eight‐neighborhood‐based method for increasing the localization accuracy of guidewire endpoint to improve the safety of interventional procedures is proposed herein. The proposed method includes two stages: 1) An improved U‐Net network is employed for segmenting the data of the guidewire to extract regions of interest containing guidewire endpoints with higher precision and to reduce interference from other anatomical structures and imaging artifacts. 2) The proposed method detects guidewire endpoints using the adjacent relationship between pixels in the eight‐neighborhood regions. This stage covers skeletonization extraction, removal of bifurcation points, and repair of fracture points. This study achieves mean pixel errors of 2.02 and 2.13 pixels in an in vivo rabbit and porcine X‐ray fluoroscopy images, outperforming ten classic heatmap and regression methods, achieving state‐of‐the‐art detection results. The proposed method can also be applied to detect other tiny surgical instruments such as stents and balloons, while preserving the flexibility of the guidewire bending angle.


Introduction
Cardiovascular diseases (CVDs) are diseases of the heart or its blood vessels.Despite the gradually decreasing age-standardized CVD morbidity rates, CVDs continue to be the highest contributors to global mortality. [1]A grow-Thus far, numerous studies have focused on tool segmentation for processing and visualizing tool navigation in X-Ray angiograms with minimal effort.However, automatic endovascular tool segmentation and tracking in fluoroscopy images suffer from certain limitations.For example, tiny targets cannot be easily detected in angiography because of the interference caused by nontargeted body organs and tissues.Further, 0.014 in guidewire with the radiopaque material coating is visible, while the other parts are almost invisible; this makes it difficult to segment small target objects in X-Ray images.Guidewires have a simple appearance, and their outlines can be easily confused with those of other similar objects (those of bones or lungs) in fluoroscopy images.Finally, because the tip of the guidewire is soft, it is difficult to establish a fixed position relationship between key points.
A medical guidewire is a surgical tool; a guidewire endpoint is the key point of a surgical instrument.Current studies apply computer vision methods to address the challenges with the segmentation, detection, and localization of surgical instruments in medical images.These algorithms directly obtain manually crafted surgical instrument features from the images to learn the appearance characteristic, which they then use to detect or track the surgical tools.However, these methods are not commonly used because they cannot obtain high-level semantic information from an image.
In recent years, deep learning methods have been applied to surgical tools segmentation, [4,5] detection, [6] and key points localization [7] and validated on the vascular dataset.A highresolution network (HRNet) applied to object detection and semantic segmentation by following the heatmap estimation framework, particularly the development of object detection, can help realize high spatial precision. [8,9]A novel bottom-up human pose estimation method (HigherHRNet) was used to localize key points more precisely for a small object. [10]The Hourglass network, popularized in the domain of human pose estimation, is similar to other encoder-decoder networks and has a denser use of residual blocks. [11]For a tiny and soft guidewire detection task, the size of the guidewire contained in the X-Ray image is limited, and a high detection precision of the guidewire endpoint is required.
Owing to the fast-developing field of surgical instrument studies, the development of heatmap regression methods provides us with several novel ideas for surgical tools detection.To address the challenge of small object detection precision, the most commonly used ResNet-50 was employed to localize the key points of surgical tools. [12][15] The Hourglass network model is one of the most popular models for instance segmentation and can accomplish instance detection and segmentation for each instance in a single model. [16]Zhou et al. [17] introduced an enhanced RetinaNet network, incorporating an ResNet18-based encoder to generate feature maps and a dedicated subnetwork for needle detection.These methods based on deep learning exhibited significantly improved detection accuracy compared to that using traditional methods because these methods extract high-level semantic information from an image.However, directly using the aforementioned methods has some drawbacks.On the one hand, these methods were applied to detect surgical tools, which are rigid bodies, during laparoscopic and retinal surgeries.This led to the existence of a relatively fixed relationship between different points of surgical tools.Therefore, these detection methods are unsuitable for the guidewire tip with the radiopaque material coating.On the other hand, the size of the guidewire appearance is tiny and soft, which allows using existing key-points detection methods based on deep learning of surgical instruments to detect big targets such as a catheter, operating forceps, and endoscopy, making it difficult to detect tiny targets given the lack of detection accuracy.
The existing literature indicates that studies focused on the localization of guidewire endpoints in X-Ray images are limited.For example, Cronin et al. [18] used impedance-based electroanatomic mapping for real-time guidewire localization.Bedel et al. [19] optimized the central venous catheter position of ultrasonic control by adopting transthoracic echocardiography to measure the guidewire position, while applying point-of-care ultrasound to improve central venous catheter tip positioning. [20]he rapid advancement of deep learning technologies [21] has helped accelerate the development of computer vision.[24] Considering the outstanding performance of the convolutional neural network (CNN) for medical image segmentation, [25] the commonly applied guidewire localization networks are generally improved variants of CNN. [26]Zhou et al. [27] proposed a real-time multifunctional framework for automatically analyzing the morphological position of the guidewire by combining a fast attention recurrent network.Li et al. [28] improved the accuracy of guidewire localization by proposing a two-stage framework for multiguidewire endpoint localization that uses the YOLOv3 detector to detect the guidewire.They also used postprocessing to refine guidewire detection, while applying the segmentation attention hourglass network to predict endpoint locations.This approach yielded better performance; however, the original segmentation label was extended from its intended use for tiny targets to bold large targets, while the bounding boxes of each guidewire were labeled and extracted from the entire image, creating a new dataset for localization.Li et al. [29] also proposed a novel key-point localization region-based CNN (KL R-CNN) model to detect the guidewire and localize endpoint.This approach relied on the guidewire within a bounding box area dataset as the input of model.However, these methods altered the size of the guidewire within the entire image complexity and variability of surgical procedures and shifted the focus from tiny object detection to large target detection tasks.
Therefore, we propose a new detection framework for tiny targets in this article.The overall framework is shown in Figure 1.This framework addresses the challenges of tiny target detection in two stages: first, extracting the entire guidewire from X-Ray images using an improved U-Net method and second, implementing guidewire endpoints detection based on the proposed eight-neighborhood method after guidewire segmentation.Further, guidewire segmentation from an entire X-Ray image, which deleted the background and only saved the guidewire, can help eliminate disturbance points in the X-Ray image before guidewire endpoint detection, thereby reducing the challenge of simple appearance.In recent years, deep convolutional neural networks (DCNNs) have developed rapidly in medical processing field. [30,31]In the first stage of our framework, an improved U-Net neural network based on CNN is designed to segment the guidewire from an entire X-Ray image and reduce the interference target from the background image; this network can be applied to other instruments for key-points detection.In the second stage, an eight-neighborhood-based algorithm for tiny guidewire endpoints detection is proposed.This algorithm is based on the adjacent relationships between the pixels of eight neighborhood regions.Guidewire key-point pixels such as: 1) medial axis skeletonization; 2) bifurcation point removal; 3) guidewire breakageband repair; and 4) endpoint detection are processed using this algorithm.A surgical guidewire tip is soft and flexible and forms complex shapes (twist and circle), which easily deforms the guidewire body such that it can no longer pass through the branch or stenosis path.In such a case, the operator can increase the risk of vascular rupture if they attempt to pass the guidewire through this complex path.Thus, we compute the angle information based on the ideas of distance with each pixel to the line between the start-and endpoint pixels after endpoint detection.This helps inform operators or surgical robots about the current bending angle value of the guidewire and guides the operator to change their manipulation strategy to minimize the risk of surgery.The main contributions of this study are as follows: 1) An improved U-Net model with semantic segmentation to extract guidewire feature maps from X-Ray images during a robot-assisted vascular interventional procedure is proposed.The proposed model achieves a better segmentation performance for a small object; 2) A novel two-stage method for guidewire endpoint detection, involving skeletonization processing, removal of bifurcation pixel points, breakage-band repair, and endpoint detection based on pixel-adjacent relationships, is introduced.This method demonstrated superior performance when evaluated on two datasets, outperforming other conventional heatmap or regression methods; and 3) The effectiveness of the proposed method based on pixel-adjacent relationships in detecting maximum bending regions and calculating angle values is validated.This information offers valuable feedback to surgeons, enabling them to adjust control strategies for safe and efficient robot-assisted catheterization.
The remainder of this article is organized as follows.Section 2 highlights the implementation of a robot-assisted interventional platform for collecting and preprocessing in vivo rabbit and porcine model datasets and the guidewire endpoint detection method.The performance of the methods on two types of datasets and the results compared with other typical heatmap regression methods are introduced in Section 3. Results and discussions are presented in Section 4. Finally, the conclusions are presented in Section 5.

Acquisition of Dataset
The acquisition of efficient and real datasets is vital for guidewire endpoint detection.In this study, all acquired images are based on a self-designed master-slave vascular interventional surgical robot platform, where the surgeon manipulates the robotic master control terminal (doctor terminal) and slave mechanism (patient terminal) to receive instructions for delivering the guidewire along the vascular path to reach the target site.The systemic framework of dataset acquisition is shown in Figure 2. The vascular interventional robotic system is designed to assist the surgeon to complete intravascular interventional surgery while reducing X-Ray exposure and fatigue from wearing the generated heavy-protective apparel (lead aprons).The current robotic system platform includes master and slave mechanisms for the teleoperated navigation of endovascular tools.The master device with two degrees of freedom (DoFs) has knob and clamp controllers for guiding axial and radial tools (catheter/guidewire), whereas the slave device with four DoFs includes a tool clamp knob to grasp and set the vertical orientation of the endovascular tool.The present study uses two X-Ray angiograms datasets for detecting guidewire endpoints.All research datasets used in this study are obtained based on this robot-assisted vascular interventional platform.The first dataset (Dataset A) is acquired using the in vivo rabbit vascular experiment, and the second dataset (Dataset B) is acquired from the in vivo porcine model.
Dataset A: The master-slave robotic system implemented several in vivo interventional procedures by cannulating a vascular pathway in six rabbits (average weight: 2.21 AE 0.29 kg) by navigating a 0.014-guidewire along the auricle-to-coronary arterial path.All ethical and experimental procedures and protocols were approved by the Shenzhen Institutes of Advanced Technology of 515 pixels, a height of 512 pixels, and a resolution of 96 dpi both horizontally and vertically.Consequently, each image has a calculated resolution of 0.26 Â 0.26 mm 2 .As some images lack the guidewire trajectory information, a total of 1880 effective X-Ray images were selected as dataset B for further processing.The guidewire trajectory information was marked using LabelMe and saved as a JSON file.During catheterization with the vascular interventional robot, a contrast dye was injected to aid the acquisition of angiograms with navigation views of the endovascular tools and the blood vessel.The 2D/3D X-Ray images allow the visualization of operators, and they can be acquired at low or high resolutions and with or without angiography subtraction.The experimental procedure was approved by the Shenzhen Advanced Animal Study Service Center (No. AAS 191204P).

Guidewire Endpoint Detection Method
Our proposed guidewire endpoint detection framework included two stages: 1) guidewire segmentation; and 2) guidewire endpoint detection.We used an improved U-Net network to segment the entire guidewire from each X-Ray image as a feature image.Then, the entire guidewire was applied to detect guidewire endpoint using our proposed eight-neighborhood algorithm.

First Stage: Guidewire Segmentation
Improving the U-Net network: Guidewire segmentation in X-Ray images is a challenging task because of the small size and soft nature of the target.Therefore, in the first stage of our proposed approach, the entire guidewire is extracted as a new object with a 255-pixel value in each single image, whereas the background is assigned as another new object with a zero-pixel value.Currently, there are some semantic segmentation networks based on real or medical scenarios that satisfy the image segmentation demand; however, these networks lack the detection accuracy for a tiny target.The U-Net model features a clear and easily interpretable encoder-decoder structure with skip connections. [33]U-Netþþ, an extension of the U-Net architecture, introduces unique nested skip pathways at each level of the network. [34]Their effectiveness in handling small datasets enhances their applicability, particularly in the domain of medical image datasets. [35,36]he represented object segmentation method, U-Net network, was applied in this study to obtain high-level semantic information from an X-Ray image.Based on the structure of the U-Net network, [37] batch normalization was applied to standardize the input values of each neuron in each layer of the U-Net network to obtain a normalization distribution with a mean value of 0 and a variance value of 1, which can help speed up the convergence process, reduce the requirement for network initiation, and improve performance.Further, batch normalization is utilized in the improved decoding structure.For the original U-Net, crop processing is adopted to ensure that the network sizes of encoding and decoding are equal for the concatenation operation.Cropping leads to information loss when combined with the network context structure.Therefore, the original conv 3 Â 3 is replaced with padding-conv 3 Â 3, and the latter is also used in the decoding part to ensure that the size of the coding feature map remains consistent.In the improved U-Net network structure, some padding-conv 3 Â 3 reduce the number of channels to one-fourth of the original U-Net while deleting the middle convolution layer of the original decoding structure to reduce the redundancy of the network, as shown in Figure 3.A 3 Â 3 convolution with zero padding was performed once after each of the first three up-samplings of the feature map, and thereafter, it was applied in the last layer as input into the softmax layer to optimize the lack of the receptive field of 1 Â 1 convolution for the previous layer.

Design of Decoding Network:
The activation function is a crucial component of the convolution layer connection.Compared to the sigmoid function, the rectified linear unit (ReLU) activation function [37] accelerates the convergence speed and prevents gradient disappearance when Â is greater than zero.In addition, the ReLU function has low computational complexity and is thus used to enhance the decoding block of a network.The ReLU activation function is expressed as The output size of the convolution layer is governed by where W, K, P, S, and O denote the size of the input feature size, kernel size, padding size, stride, and output feature size, respectively.The convolution process is presented in Figure 4, where p = 1, k = 3, and s = 1.If o is set to w, using padding 3 Â 3 conv can generate equal-sized input and output feature maps.The convolution operation in the decoding process can be used to encode the feature map to recover the original size and avoid information loss.The designed decoding network is shown in Figure 3.In the convolution process, the number of convolution kernels is reduced significantly, and the convolution layers with the same number of channels are not stacked repeatedly in the up-sampling process.This reduces the number of network parameters to be trained.
In addition, the coding part of the improved U-Net is transformed by the migration of ResNet50 [30] and the segmentation performance of the network is explored.ResNet50 comprises identity and conv blocks.After removing the full connection layer of ResNet50 and migrating it to the improved U-Net, we obtain the improved U-Net model.

Second Stage: Guidewire Endpoint Detection
Guidewire endpoints reflect the connecting relationship between different pixels, [38] and therefore, we propose an eightneighborhood endpoint detection method based on this connecting relationship.The eight-neighborhood detection method is based on pixel points around eight adjacent relationships to track the guidewire endpoint.This procedure requires the pixel value of the image to be 0 or 255; the image smallest unit is a pixel point.Thus, the binarized processing used in each image, which is 0, represents the black background object and 255 represents the target guidewire, while each adjacent pixel has 8 adjacent pixel points.Based on these pixel points of the guidewire, the entire guidewire skeleton was extracted after binarized processing.The output guidewire, following semantic segmentation, exhibited nonsmooth characteristics, leading to discontinuities and fractures between the start and endpoints.Even after skeleton extraction, the guidewire remained fractured.Consequently, as part of the postskeleton processing, some branch pixel points were removed, and some fractured pixels were repaired to maintain the context relationship of the surgical guidewire skeleton between consecutive pixels.Finally, we completed the guidewire endpoint detection based on the eight-neighborhood theory relationship.The overall framework is shown in Figure 4.
Skeletonization Extraction of Guidewire: Skeletonization describes the topological structure of the object, which is applied in image recognition to reduce the redundancy of the target object and remove unnecessary information.Skeletonization extraction processing is defined as removing boundary pixel points without damaging the connectivity of the image.The goal is to reduce a connected region to a pixel width, and the precise mathematical definition of the skeleton involves consolidating the central points of the maximum tangent hypersphere at all points along the boundary.This process can be envisioned as if the target's edge lines are uniformly illuminated, and a fire front uniformly propagates toward the interior.As this front intersects, the flame extinguishes, and the union of the extinguished points represents the skeletonization.Specifically, the first step is to assume a 255-value white pixel as the object and a 0-value black pixel as the background.We find each pixel of the image in all directions.If the current pixel point (x, y) is a white point, we determine four other pixel points ((x À 1, y þ 1), (x À 1, y), (x À 1, y À 1), and (x, y -1)) around pixel point (x, y).The layer of the current pixel point added 255 as a new layer (pixel point (x, y) is a black point), which was assigned a value of 0. This procedure is expressed as The second step of this algorithm is similar to the first step.We scan each pixel of the image from down to up and right to left while determining four other pixel points ((x À 1, y þ 1), (x À 1, y), (x À 1, y À 1), and (x, y À 1)) around the current pixel point (x, y), which is assigned a value of 255 or 0 for the layer.This procedure is defined by Thus, the first step obtained the value of the upper enclosing layer from each pixel point, and the second step obtained the value of the down enclosing layer.The actual value of the layer from each pixel is the minimum value between the value of the upper and down enclosing layers in Finally, the layer values of the eight direction pixel points around the current pixel were compared to the current pixel value by scanning each pixel point.If the layer value of the current pixel is the maximum, this pixel point will be retained; otherwise, it will be removed.The skeletonization extraction result is shown in Figure 5.The guidewire after semantic segmentation was processed using the skeletonization extraction method to obtain a topology structure that reflected various features of the shape of the cell such as intersections, inflection point, and fracture point.
Removing the Processing of Pixel Bifurcation Points: The extracted guidewire is not smooth after semantic segmentation, and therefore, it generates bifurcation points after skeletonization processing.To accurately detect the guidewire endpoint, bifurcation points need to be removed from the guidewire skeletonization feature map. Figure 6 shows that there are three pixels in the eight-neighborhood regions of bifurcation points.The pixel points of the three directions in the eight neighborhood of bifurcation points can connect to the next pixel point outside the eight-neighborhood region and obtain a branch path by concatenating the pixel of each layer.However, the nonbifurcation points only include two pixels or one pixel.Based on this structural characteristic, we can easily find the position of the bifurcation point and remove it.When we find the bifurcation point, we calculate the concatenated path of the pixel in three directions of the eight-neighborhood region.The concatenated path is shown in Figure 6a by red, blue, and green dotted lines.The shortest path is considered the abnormal branch of the bifurcation point, and we remove it to obtain the real guidewire endpoint.
Repair Processing of Pixel Fracture Points: The precision of the semantic segmentation method directly influences the quality of the extracted guidewire feature map; a higher accuracy results in fewer fracture points in the output guidewire.Despite the superior performance of the proposed segmentation method compared to other semantic segmentation techniques, it is observed that the guidewire still exhibits fracture points after semantic segmentation.These fracture points persist even after skeletonization processing.Consequently, all fracture points within the range from the initial pixel point to the concluding pixel point were addressed through repair.The objective of this repair process was to restore the connectivity of the guidewire, a crucial factor for accurately detecting its true endpoint before proceeding with endpoint detection.
First, it was determined whether each pixel point was a fracture point.If this pixel point was a fracture, the two adjacent pixel points around the fracture point were connected to ensure the continuity of the guidewire.The detailed process is shown in Figure 7.Each pixel point of the guidewire skeletonization feature map was searched from down to up and left to right to obtain the number of pixel points connecting with the current pixel point in the eight neighborhood of the current pixel.When the number of pixel points was greater than 1, the current pixel point was not a fracture point; otherwise, the number is equal to 1 and the next pixel has only one adjacent pixel point in the eightneighborhood region.Pixel (p1) is one endpoint of the fracture point and pixel (p2) is another endpoint of the fracture point.To repair the fracture point, we connected p1 and p2 pixels using a straight line to obtain these background pixels, which were passed through by a straight line.Background pixels with the shortest distance from their center positions to the line were assigned a value of 255 as the guidewire pixel point to complete the fracture point connection.
Guidewire Endpoint Detection: The guidewire feature map result after semantic segmentation indicated that the detection stage was responsible for predicting the coordinates of the endpoints of each guidewire.First, skeletonization processing was applied in the guidewire feature map to obtain the skeletonization structure of the guidewire.The bifurcation point in this structure was removed and the fracture points were repaired.The given guidewire feature map was used as input to complete further detection processing.The guidewire feature map used skeletonization and defect interpolation processing, and therefore, there is only one pixel connected to the endpoint in the eight-neighborhood region of the endpoint and two-pixel points connected to the non-endpoint in the eight-neighborhood region of the non-endpoint pixel (Figure 8).The eight-neighborhood    detection method based on this idea used the output feature maps of removal and repair processing as input to scan each pixel for detecting the guidewire endpoint.
Guidewire Bending Detection: In vascular interventional surgery, despite the guidewire being tiny and flexible, excessive bending can cause mechanical damage to the arterial lining, which generates the platelet aggregation and stimulates intimal hyperplasia.Consequently, we compute the bending angle of the guidewire based on its present pose such that the operator can alter its control strategy based on this angle information to reduce the risk of vascular rapture.To obtain the bending shape of the guidewire, the bending angle was calculated using the connecting line from the start pointA x1 i , y1 j À Á to the endpoint B x2 i , y2 j À Á .The specific step involves finding the start pixel point via reverse to forward searching.The line was determined by connecting the endpoint to the start point; thereafter, the distance from each pixel between the start point and the endpoint to this line was calculated.The pixel that had the longest distance was considered the highest point C x3 i , y3 j À Á .The angle was made by the highest point serving as the center and connecting the start point to the endpoint, which was the guidewire bending degree as in Equation ( 5)-( 9).
where a is the length of connected line highest point to endpoint, b is the length of connected-line highest point to start point, and c is the length of connected line startpoint to endpoint.
, and x3 i , y3 j À Á are the pixel positions with i, jε 1, 2, : : : m ð Þ .The bending pose of the guidewire can be evaluated using the longest distance and angle values, serving as a reference for the operator in devising a manipulation strategy.Using the described methods, the length of the scanned guidewire segment and the scanning interval were utilized for sliding window scanning of the entire guidewire.This involved calculating the included angle of all segments and the distance between all pixel points of the guidewire in each segment to the straight line, ultimately determining the highest point.The bending angle was then computed using the position information of the start point, endpoint, and the highest pixel.The detailed procedure is illustrated in Figure 9.

Training Strategy
For the reference base, the improved U-Net is known as JS-U-Net, for the J-shape decoder network designed for it.The categorical cross entropy function [14] is used by all networks and is defined as Figure 9. Processing of maximum bending region detection and angle value calculation.

Performance Evaluation Index
As a commonly used performance evaluation index, the mean intersection over union (MIoU) is adopted to obtain the degree of coincidence between the segmentation results of each category and the original marker image.MIoU is also applied to evaluate the performance of the network model in semantic segmentation as where n þ 1 denotes the number of image pixel categories, q ii denotes the total number of real guidewire pixel i and prediction guidewire pixel i, q ij denotes the total number of real guidewire pixeli and prediction background pixel j, and q ji denotes the total number of real background pixel j and prediction guidewire pixel i.
The formulas of precision, recall, F1 À score i , and macro À F1 are respectively provided by macro À F1 ¼ In Equation ( 12)-( 15), true positive (TP) denotes the number of guidewire pixels that were correctly detected, false positive (FP) denotes the number of background pixels that were detected as guidewire pixels, and false negative (FN) denotes the number of guidewire pixels that were incorrectly classified.The sensitivity of medical image segmentation to a missing and superfluous foreground comprises the F1-score, that is, the model accuracy and recall are equally important.Moreover, to assess the processing speed of the model for images, the forward feedback rocessing speed (FFPS) for images is calculated using Equation (16).
where N is the total number of test samples, m is the index of the m-th test sample, and time m is the forward feedback processing time of the model for each sample.The unit of FFPS is frames per second (fps) and is employed to evaluate the processing time of forward feedback propagation in the network model.Furthermore, the mean pixel error (MPE) is used to evaluate the performance of the detection method.The MPE is based on the distance (in pixels) metric, calculated as the average distance error between the predicted pixel point and the ground-truth pixel point, as demonstrated by where N denotes the total number of test samples, jj:jj denotes the Euclidean distance between two points, P i denotes the i-th sample predicted endpoint pixel point, and G i denotes the i-th sample ground-truth endpoint pixel point.10.The improved U-Net model was trained for 50 iterations.Figure 10a shows that the training accuracy started with 60% from the first iteration, and at the 40th iteration, the model reached the maximum accuracy of 83.45% in Dataset A. For Dataset B, the model has a similar performance.The 35th iteration suggests that the accuracy rate of the improved U-Net model remained at the maximum until the end of the training.Additionally, Figure 10 also shows that the training loss reaches 0.4 within the first iteration and is reduced to 0.2 at the 50th iteration, which infers the characteristics of a good model.The experimental results demonstrate that an improved U-Net segmentation model can achieve guidewire prediction with a performance accuracy of 89.46% and 92.43% on validation Datasets A and B, respectively.The validation loss decreases rapidly in the 10 epochs of the training for Dataset A and declines in 15 epochs of training for Dataset B, indicating that the network learnt significantly in those epochs.During the training and validation process, the model exhibits stability in its performance, which proves that it does not overfit based on the performance of training and validation losses.
The performance of the improved U-Net model is verified on test Datasets A and B, and the confusion matrix suggests that the proposed model segmented the guidewire pixel points data appropriately with a performance of 94.57% (Macro-F1) in the rabbit dataset and 95.48% (Macro-F1) in the porcine dataset.The details of the confusion matrix show that the model has high recall and precision for the background and guidewire pixels (i.e., (99.94% and 99.95%) and (90.53% and 87.87%), respectively) for the in vivo rabbit dataset.For the in vivo porcine dataset, the prediction of the background pixel has 99.89% recall and 99.91% precision and 92.07%recall and 90.07%precision in the guidewire pixel.
Moreover, this study employed three models U-Net, U-Netþþ, and DeepLabV3þþ and applied them to the datasets to assess the improved U-Net's performance.MIoU, FFPS, and model parameters were computed during the model training with the two trial datasets.The results of our method, compared with other existing segmentation methods, are summarized in Table 1.The MIoU of the improved U-Net demonstrated a prediction rate of 99.90% for background pixels, 80.47% for guidewire pixels, and a total mean prediction result of 90.19% for Dataset A. For Dataset B, the improved U-Net method correctly segmented 83.58% of the guidewire and 99.80% of the background, achieving an overall average segmentation accuracy of 91.69%.Compared to U-Net, the improved U-Net showed better performance with a percentage increment of 13.95% MIoU in Dataset A and 11.76% MIoU in Dataset B. The U-Netþþ model, a variation of U-Net, demonstrated superior performance compared to U-Net.Additionally, the results indicated that the improved U-Net showed an increment of 2.8% MIoU in Dataset A and 4.45% MIoU in Dataset B compared to U-Netþþ.Moreover, the improved U-Net model had lower parameter requirements than the classical U-Net and U-Netþþ models.Furthermore, DeepLabV3þþ exhibited lower memory requirements with ≈6.44 million parameters, while the improved U-Net architecture used 8.85 million parameters.In contrast, the improved U-Net model also outperformed in the MIoU

Detection Performance Comparison with Related Heatmap Methods
After completing guidewire segmentation, guidewire feature maps were passed onto the second-stage processing to detect the endpoint.Based on the eight-neighborhood characteristics of pixel distribution, skeletonization extraction was applied in the feature maps to obtain the topology structure of the guidewire (in pixels).Thereafter, the abnormal branch was removed and fracture band was repaired for the guidewire topology structure to acquire the guidewire-truth endpoints.The performance of the eight-neighborhood-based detection method is assessed and compared with those of existing methods using the MPE metric obtained with Datasets A and B. Four typical methods are implemented for method evaluation and comparison, and their performances are compared with that of the proposed ensemble method using Table 2.
The results suggested that our method achieved the best performance with an MPE of 2.02 (AE0.The performance of the proposed method for Dataset B is compared with those of the four other typical methods in Table 2.The results demonstrate that the proposed method outperformed the other methods by an MPE of 2.13 pixels on the guidewire endpoint detection task.Further, the MPE of endpoint detection, 2.86 (AE0.09) of Hourglass, 3.46 (AE0.13) of PoseResNet, 3.06 (AE0.08) of HRNet, and 3.13 (AE0.18) of HigherHRNet were verified in the porcine dataset.
To provide a more intuitive representation of the model's performance in guidewire endpoint detection, the model detection error was converted from pixel-level measurements to millimeter-based errors, taking into account the actual physical size information of the image.This transformation is depicted in Figure 12.The proposed method exhibited the lowest average error, with a 0.53 mm error in guidewire endpoint tracking for the rabbit experiment and a 0.56 mm error in guidewire endpoint detection for the porcine experiment, outperforming the other four typical heatmap methods.Visual results of guidewire endpoint detection for the five models are illustrated in Figure 13.
A T-test was used to test whether the MPE difference between Hourglass, PoseResNet, HRNet, HigherHRNet, and our method is significant in Datasets A and B. As summarized in Table 2, the p-values are less than 0.05, suggesting that our method is significantly better than the other four typical heatmap methods.The results indicate that the eight-neighborhood-based method achieves significantly better detection results compared with other learning methods.Other guidewire endpoint detection methods have larger mean pixel errors because the guidewire endpoint detection and the detection results are not closely related for intravascular interventional surgical instruments.

Detection Performance Comparison with Related Heatmap Regression Methods
To evaluate the proposed method performance, we also applied six existing heatmap regression methods from previous studies and implemented them on the dataset A and B. The details of the implementation and validation performed for the existing methods are shown in Table 3.The results indicate that the mean pixel errors of six regression methods obtained are significantly higher than those of our method.Although the lightweight model MobileNetv2 can reduce the number of parameters and computations, the guidewire endpoint detection performance is the worst on Datasets A and B, reaching MPEs of 13.07 AE 1.43 and 11.09 AE 0.74, respectively, while DenseNet121 with less parameters obtained MPEs of 7.63 AE 1.94 and 8.24 AE 1.42 in the detection results of Datasets A and B, respectively.Hourglass and ResNet50 achieved similar detection results.For ResNet, neither the 101-layer ResNet nor the 152-layer ResNet scheme were less effective than the 50-layer ResNet in Datasets A and B, respectively.The performances of ResNet101 Datasets A and B are better than those of ResNet152, indicating that the guidewire endpoint detection results do not benefit from deeper layers.However, we found that the performance of ResNet50 is better than that of ResNet101, suggesting that the guidewire endpoint detection task is relatively simple and does not require a deeper layer to extract more information for completing the detection task.The output of the guidewire endpoint detection is significantly improved under these two-stage detection frameworks.
Moreover, to account for the physical meaning of the image, the model's detection errors at the pixel level were transformed into millimeter errors based on the actual physical size information of the image, as depicted in Figure 14.The proposed method exhibited superior performance with the lowest millimeter error compared to the other six heatmap regression methods.
Visualization of the results of guidewire endpoint detection is presented in Figure 15.A T-test is applied to show differences between the eight-neighborhood-based module and other learning methods.Table 3 indicates that the differences between seven guidewire end-point detection task methods are significant.We attribute this improvement to the idea of skeletonization to repair applied in the eight-neighborhood region of the pixel point.Our eight-neighborhood detection method is designed to obtain region information maps of each pixel point that can eliminate several bifurcation areas or fracture bands of the input.Therefore, we can obtain more precise results using the pixel-adjacent relationship in the guidewire pixel point of the segmentation output feature maps.

Bending Angle of Guidewire
The bending angle of the guidewire is a crucial parameter that suggests the current posture of the guidewire motion in the vascular pathway.Accurately detecting and measuring the bending angle of the guidewire can provide valuable feedback to the surgeon to guide them in adjusting their manipulation strategy for reducing the risk of vascular rupture during surgical procedures.Therefore, the proposed method is applied to identify regions of guidewire maximum bending and calculate the bending angle value after completing guidewire endpoint detection to provide a quantitative measure of the degree of bending at that point.The performance of the guidewire bending detection method is certified using Datasets A and B. Some bending region detection results and angle value of the pixel-adjacent relation-based methods are shown in Figure 16.The average accuracy of the maximum bending region detection method can reach 91.13 AE 1.12% on Dataset A and 93.18% on Dataset B, indicating that this method is sensitive and effective to the maximum bending region detection.

Discussion
Although the eight-neighborhood detection method depends on the relationship between adjacent pixels while this relationship between pixels relies on the performance of the semantic segmentation, the proposed method can obtain a better performance under the following premise: the performance of semantic segmentation is sufficiently good with only occasional FNs or FPs.Compared with U-net, U-Netþþ, and DeepLabv3þþ, our proposed improved U-Net model showed better ability on guidewire segmentation accuracy.If the segmentation performance is poor, this detection method does not improve the output detection results.
We attribute endpoint detection improvement to two ideas introduced by the eight-neighborhood method.The first is the idea of tiny target segmentation, which contributes to achieving the interesting feature for reducing interference information.The second is the idea of skeletonization to repair used in the detection method.Skeletonization extraction can be considered the close relationship results of the pixel, which can help suppress several useless pixel areas in the input.As most guidewire pixels are suppressed by skeletonization processing and repair, abnormal pixels in endpoint detection can also be addressed.Therefore, more precise detection results can be obtained using skeletonization and repair processing based on the relationship between pixels in the neighborhood regions.Further, our method has the best detection results compared with no matter heatmap four methods detection results or heatmap regression six methods detection results.The MPE of our method is the lowest at ≈2.02 AE 0.04 pixels (0.53 AE 0.01 mm) in Dataset A and 2.13 AE 0.37 pixels (0.56 AE 0.09 mm) in Dataset B.
Compared with the Hourglass network, our method exhibited outstanding performance in guidewire endpoint detection.The hourglass network based on the heatmap method is not sensitive for detecting the tiny guidewire endpoint task in spite it being more suitable for human pose estimation and obtaining better performance than the other PoseRenet/HRNet/ HigherHRNet methods in endpoint detection.This is because repeated bottom-up and top-down convolution operations in DCNNs led to a significant decrease in the initial image resolution, which can lead to poor guidewire endpoint detection.The deep PoseResNet method is utilized for heatmap extraction in human pose estimation tasks, and it focuses on addressing the issue of network model degradation.Our results demonstrate that the gap in the performance between our proposed method and the PoseResNet method is the largest, when compared with the other Hourglass/ HRNet/ HigherHRNet methods, respectively.This suggests that even though PoseResNet is sensitive  to human pose estimation, it is not suitable for detecting small objects, particularly tiny guidewire endpoints.We also observed that HRNet demonstrated better performance than other Resnet models in guidewire endpoint detection, which indicates that high-resolution representations are essential for positionsensitive guidewire endpoint localization.HRNet maintains highresolution representations throughout the process unlike other Resnet (e.i, PoseResNet, ResNet50, ResNet101, and ResNet152) and VGGNet frameworks, which first encode the input image as a low-resolution representation through a subnetwork formed by connecting high-to-low resolution convolutions and subsequently recover the high-resolution representation from the encoded lowresolution representation.However, HigherHRNet combines high-resolution feature pyramids for learning scale-aware representations in predicting the correct position for a small object.We used it for endpoint detection to estimate feature maps from the combined high-resolution representation.The results showed that its performance is similar to that of HRNet or worse.Therefore, four typical heatmap methods and six heatmap regression methods are reasonable for human pose estimation, but not suitable for tiny guidewire endpoint detection.
Our proposed method also exhibited better performance compared with other six typical methods based on heatmap regression methods.For the Resnet model, the detection performance of ResNet50 is superior to that with the same type, but different layers of ResNet101 and ResNet152 indicate that the average resolution of the guidewire is significantly low.Because ResNet50 with a small architecture can yield better results involving a lowresolution image even though the residual block has the identity block that preserves the existing information, both existing and newly trained information contribute to the total performance.Moreover, ResNet50 detects the guidewire endpoint more accurately than DenseNet121 and MobileNetv2 in an X-Ray image, suggesting that a residual block provides a completely different solution and is suitable for tiny targets detection.MobileNetv2 has a worse performance than our proposed method and ResNet50, ResNet101, ResNet152, Hourglass, and DenseNet121 heatmap regression methods in guidewire endpoint detection.Even though MobileNetv2 with inverse residual and linear bottleneck structures contributes to reducing the memory usage and calculation of convolution, it is not sensitive for tiny target features.Because MobileNetv2 makes the network level deeper, this result suggests that the guidewire detection task does not require deeper layers because they affect the learning ability of the model for feature maps.
Moreover, methods based on the heatmap model are better than that those based on the heatmap regression model in the guidewire endpoint detection task in both Datasets A and B, which proves that the methods based on the heatmap seem to be better at processing noise and errors of images.Because the resolution will become smaller with an increase in the down-sampling rate, the classification task is expected to become easier while the difficulty of regression tasks increases.Methods based on the heatmap model have better performance on fault tolerance for noise and errors; however, those based on the heatmap regression model are sensitive to input noise and errors, which result in the inaccurate prediction of the key-point location.Our method, which is based on the pixel-adjacent relation idea, is different from both methods and achieves the best performance.Models perform well during training and under a standardized public dataset.However, when it comes to real-world data, they may not perform well.Therefore, we tested our proposed method and other typical detection methods under our self-acquiring X-Ray images during the robot-assisted vascular interventional procedure to identify the fair performance of the proposed method without any bias.Using various deep neural networks, our proposed technique demonstrated competitive performance of X-Ray image analysis without the need for a highly specialized deep learning machine and without a dataset of millions of example images.The method retained high performance in accuracy for detecting guidewire endpoints, which illustrates the power of the two-stage key-point detection method based on the pixel-adjacent relation for effective guidewire endpoint detection based on 2D angiograms, even with an extremely limited training image dataset (Figure 17).
Comparing the guidewire endpoints detection, tenfold crossvalidation results on Dataset A from the in vivo rabbit model and on Dataset B from the in vivo porcine model reveal that our proposed method exhibits similar performance on Datasets A and B because the training samples in both datasets are X-Ray images.Further, the two datasets are being obtained from different equipment, but there are similar types of pixel points that need to be localized.This indicates that our proposed method exhibits a good generalization ability on tiny object endpoints detection and can process a new dataset that has not appeared in the training dataset.This model can be applied to other surgical instrument key-point detection tasks.
Finally, bending angles are extracted as reference for operators to change the control strategy and reduce the risk of vascular rupture.The excellent performance of the maximum bending region detection method demonstrated on multiple datasets further supports its potential clinical utility. [39]Our bending detection method is reasonable for guidewire bending pose detection.In future, the bending angle and maximum distance parameter will be displayed on a human-computer interaction interface for operating the terminal for guidance operators using a suitable manipulating strategy and improvement safety and stability robot-assisted intervention. [40]everal limitations of this study should be considered.First, this study exhibits superior performance for the endpoint detection of a single surgical instrument.Further prospective investigation on the use of this method in research for multisurgical instruments would be valuable.Moreover, further work is required on the adaptation and generalization of this method to address the challenges of multisurgical instrument endpoint detection.This study was demonstrated on Dataset A obtained from an in vivo rabbit model and Dataset B obtained from an in vivo porcine model, which could not fully reflect the complexity and variability of surgical procedures such as the interference of other body tissues.Future studies may need to acquire several more datasets from a broad range of surgical contexts and vessel types to assess the generalizability of the proposed method.Finally, the maximum bending region detection method does not seem to be sensitive to the performance of the bending region estimation of the guidewire with a simple and smooth pose.The results demonstrated that the bending region of the guidewire with a more complex posture can be easily detected, whereas a simple guidewire posture is difficult to recognize in the bending region.Because the distance of each pixel point to the line between the start point and endpoint is short for the guidewire with a simple posture (smoothness or small bending angle), the tiny distance change is hardly detected, which affects the highest point obtained.

Conclusion
An eight-neighborhood-based method was proposed for guidewire endpoint detection.Because the guidewire is tiny and soft in X-Ray images, an improved U-Net model was applied to segment all guidewire instances, and its good performance on the tiny guidewire segmentation made it significant.After semantic segmentation, an eight-neighborhood-based method was designed to detect the endpoint of all guidewires, including the introduction of skeletonization extraction, removal of bifurcation points, repair of fracture points, and the design of method details.Compared with four typical heatmap methods and six common heatmap regression methods, our pixel-adjacent relationship-based method achieved the best detection results on Datasets A and B, and it can be used in other tiny target endpoint detection tasks.The good performances of the maximum bending region detection based on the pixel-adjacent relationship method and angle value are provided, which act as valuable feedback to surgeons in the potential applications of human-robot interaction interface, enabling them to adjust their control strategy for safe and efficient robot-assisted catheterization.

Figure 1 .
Figure 1.Overall framework for guidewire endpoint detection.The U-Net-based segmentation and pixel-adjacent relation-based endpoint detection methods in red boxes are newly proposed.The green and red points in the output image are the final detection results.

Figure 2 .
Figure 2. Image acquisition framework during robot-assisted vascular interventional procedure.a) Master-slave vascular interventional robotic system; b) acquiring Dataset A from the in vivo experiment in the rabbit model; and c) acquiring Dataset B from the in vivo experiment in the porcine model.

Figure 4 .
Figure 4. Detection framework of guidewire endpoint based on pixel-adjacent relation method.

Figure 6 .
Figure 6.Detection of bifurcation point and removing it.

Figure 7 .
Figure 7. Repair process of fracture points.

Figure 8 .
Figure 8. Detailed information of endpoint and internal points based on the pixel-adjacent relationship.

2. 5 .
Model Evaluation and Detection 2.5.1.Segmentation Performance Semantic segmentation is applied in guidewire extraction to improve its suitability for guidewire endpoint detection.The images of the tool trajectory in the robot-assisted interventions were passed onto the improved U-Net network model for evaluating the performance of the segmentation method.After guidewire segmentation, the results from the proposed method were validated in training sets from Datasets A and B. Segmentation results are obtained at a frame interval of 10 fps in Dataset A and 15 fps in Dataset B, as shown in Figure 10, respectively.The performances of the improved U-Net model in terms of training accuracy, training loss, and validation accuracy and loss are shown in Figure

Figure 10 .
Figure 10.Performance analysis with training and validation plots of proposed improved U-Net model for 100 epochs, and the confusion matrix of the model performance in the test set from Datasets A and B, in vivo a) rabbit and b) porcine models.

and
Macro-F1 evaluation metrics, with a feedforward image processing speed of 45.79 fps in Dataset A and 31.40 fps in Dataset B. The segmentation results based on the improved U-Net and three other classical models are displayed in Figure 11.The guidewire is considered a small target and accounts for less compared with the background pixel; therefore, it is difficult to segment a tiny guidewire pixel.Currently, a high accuracy indicates that the improved U-Net model utilizes the U-Net network with batch normalization and improves the decoding structure to accurately segment the guidewire in X-Ray images, which helps improve the performance of the model.

Figure 11 .
Figure 11.Guidewire segmentation results using the test set from in vivo rabbit model a) U-Net, b) U-Netþþ, c) DeepLabV3þþ, and d) the proposed improved U-Net, as well as in vivo porcine model: a 0 ) U-Net, b 0 ) U-Netþþ, c 0 ) DeepLabV3þþ, and d 0 ) the proposed improved U-Net.

Figure 12 .
Figure 12.Bar diagram illustrating the mean distance error results of four typical heatmap endpoint detection methods: a) In vivo rabbit experiment.b) In vivo porcine model.

Figure 13 .
Figure 13.Guidewire endpoint detection results of four heatmap methods.a) Our method.b) Hourglass.c) PoseResNet.d) HRNet.e) HigherHRNet (Green point denotes the true guidewire endpoint and red point represents the detection results.).The in vivo a 0 ) rabbit and b 0 ) porcine models.

Figure 14 .
Figure 14.Bar diagram illustrating the mean distance error results of four typical heatmap endpoint detection methods.a) In vivo rabbit experiment.b) In vivo porcine model.

Figure 16 .
Figure 16.Performance evaluation of guidewire bending angle detection using the eight-neighborhood method.The results include the detection of the maximum bending fragment and the corresponding bending angle of the guidewire.The 'angle' parameter represents the degree of bending in both a) in vivo rabbit and b) porcine models.

Table 1 .
Performance evaluation of different models for guidewire segmentation.Our proposed method achieves superior results compared to other methods, as highlighted in bold format.

Table 2 .
Comparison of four typical heatmap detection methods versus our method.The best results are marked in bold.

Table 3 .
Comparison of six typical endpoint detection methods versus our method.The best results are marked in bold.