Robotic Manipulator‐Assisted Omnidirectional Augmented Reality for Endoluminal Intervention Telepresence

Robotic telemedicine can provide timely treatment to critical patients in geographically remote locations. However, owing to the lack of depth information, occlusion of instrument, and view direction limitations, visual feedback from the patient to the clinician is typically unintuitive, which affects surgical safety. Herein, an omnidirectional augmented reality (AR)‐assisted robotic telepresence for interventional medicine is developed. A monocular camera is used as an AR device with a manipulator, where virtual key anatomies and instruments are superimposed on video images using multiobject hand–eye calibration, iterative closest point, and images superposition algorithms. The view direction of the camera can be changed via the manipulator, which allows the key anatomies and instruments to be viewed from different directions. Structure‐from‐motion and multi‐view stereo algorithms are used to reconstruct the scene on the patient's side, thus providing a virtual reality (VR) interactive interface for the manipulator's safe teleoperation. Two different phantom experiments are conducted to validate the effectiveness of the proposed method. Finally, an ex vivo experiment involving a porcine lumen is performed, in which the operator can view the interventional flexible robot. The proposed method integrates omnidirectional AR and VR with robotic telepresence, which can provide intuitive visual feedback to clinicians.


Introduction
Telemedicine is a modern healthcare technology based on telecommunications that allows patient-doctor interactions and does not require physical contact. [1]It provides surgical care opportunities for urgent patients in geographically remote locations and reduces the risk for clinicians. [2,3]oreover, telemedicine allows knowledge sharing between different clinicians online, even for incidentally training younger mentors.Since robots for surgery have a unique advantage, robot-assisted telemedicine has gained increasing attention in recent years, [4][5][6][7][8] such as the RAVEN surgical robot system-assisted remote telesurgery. [8]n robot-assisted telemedicine, the user is endowed with full control of the remote robot through cooperation between a human operator and the remotely controlled robot. [4,9]elepresence is necessary for robotic telemedicine, which provides various types of feedback information from the patients to clinicians via different technologies, such as vision, force, and sound.More than 20 years ago, robot-assisted remote surgery was validated for the first time, and its feasibility and potential applications in humans were studied using the Zeus robot. [10]In the patient subsystem, an additional robotic arm controls the endoscopic camera, and the camera image provides primary visual feedback.In addition, the RAVEN surgical robot system, in which visual feedback is also provided by an endoscopic camera, has been used for telesurgery. [8]However, with the existing approaches, the visual feedback from the patient to the clinician is unintuitive because of the lack of depth information, occlusion, and view direction limitations. [8,10]These problems increase the risk of potential errors and the hand-eye coordination of clinicians.
Extensive efforts have been expended to improve visual feedback in robotic telemedicine.Acemoglu et al. [11] integrated a three-dimensional (3D) exoscope into a fifth-generation (5G) robotic telesurgery system for transoral laser microsurgery, which allowed surgeons to observe the surgical site using a 3D head-mounted display (HMD) to increase depth perception.However, the direction of the 3D exoscope's view cannot be easily changed and is easily obstructed by the instruments.Moreover, an HMD imposes an additional load on surgeons.Zhao et al. [12] proposed a floating autostereoscopic 3D display approach for telesurgical visualization that allows multiple clinicians to visualize floating holographic images from different viewing directions and enhances operative cooperation and efficiency.However, the view direction of a 3D model projected onto a two-dimensional (2D) image is limited, and a 3D display that includes only virtual objects can hinder a clinician's interaction with the actual scene.In general, the techniques described above cannot provide omnidirectional visual feedback and visualize key anatomies and instruments simultaneously.
Augmented reality (AR) provides an intuitive method for precise robotic teleoperation and is becoming increasingly important in robot-assisted surgeries. [13,14]AR enhances user perceptions by superimposing computer-generated virtual objects over a real-world scene, [15] which can not only augment the vision of lesions but also virtually visualize the instruments. [16]In AR-assisted robotic endoluminal surgery, surgeons can rapidly identify key anatomical structures, and unnecessary operations can be reduced using the preoperative plan trajectory shown on a real-world scene. [17]For example, to increase the surgeon's comfort and safety during right colectomy, Volonté et al. [18] integrated 3-D volume-rendered images illustrating the relationships between tumors, blood vessels, and organs displayed on the surgeon's console.
The unique advances in AR technology render it promising for applications in robot-assisted telesurgery.Richter et al. [19] developed a stereoscopic AR predictive display to address the challenges of accurately tracking slave tools for teleoperated surgical robots, in which AR technology was used to display the predicted motions.However, they only overlaid the predicted display onto an endoscopic video.Huang et al. [20] developed an AR-based autostereoscopic surgical visualization system for telesurgery, in which a 3D model of the preoperative reconstruction and the surface point cloud of the intraoperative reconstruction were calibrated to visualize the target area and instrument.However, the viewing direction of the RGB-D camera used to reconstruct the surface cannot be easily changed, and the multi-view does not include the actual scene captured by the camera.Moreover, this system is unsuitable for flexible surgical robots used in endoluminal interventions.In our previous study, [21] we developed an AR-based robotic telepresence in which an optical see-through HMD was mounted at the distal end of a manipulator, thus providing views from different directions.Clinicians can view the key anatomies and instruments located inside at patient's body.However, an additional camera needs to be fixed behind one of the lenses of the HMD to monitor its view, which increases the complexity of the system.Moreover, a commercial HMD increases the cost of the system, and the integration of a commercial device hinders secondary development.
Hence, this paper develops a robotic manipulator-assisted omnidirectional AR (OmniAR) for endoluminal intervention medicine, as shown in Figure 1.AR, virtual reality (VR), and endoscopic views are produced on the patient's side for telepresence, and the motion commands from the operator on the clinician's side are transmitted to the patient's side for robotic operation.The virtual-targeted anatomies and instruments are superimposed on video images, and the viewing direction of the camera can be changed via the manipulator to realize the OmniAR display, allowing the anatomies and instruments located inside the patient's body to be viewed from different directions for intuitive robotic teleoperation.
The main contributions of this study are summarized as follows: 1) A robotic telepresence framework with multiobject calibration, iterative closest point (ICP) algorithm, and image Figure 1.Robotic manipulator-assisted omnidirectional AR for endoluminal intervention telepresence.A monocular camera is mounted on the distal end of the visualization manipulator.The visual feedback from the patient to the clinician includes AR, VR, and endoscopic views.In the AR view, the key anatomies of the patient and the instrument can be viewed, and the visualization manipulator can be moved to different positions to obtain different view directions.The direction of the VR view can be changed via interaction with the display software to confirm the safe distance between the manipulators and surrounding objects for teleoperation, thus ensuring the safety of the manipulators.The AR view is produced by rendering the image of the monocular camera, and the endoscopic view is the image of the camera that is fixed at the tip of the instrument.The operation manipulator, visualization manipulator, and instrument are teleoperated by the operator on the clinician's side.Moreover, the visualization manipulator can also move automatically according to the preoperative planning trajectory.
processing approaches is proposed to realize omnidirectional AR using a monocular camera and a robotic manipulator.This enables multi-view intuitive telepresence for clinicians; 2) Structure-from-motion (SfM) and multi-view stereo (MVS) algorithms are used to reconstruct the actual scene on the patient's side with the positioning of the manipulator preoperatively based on a monocular camera, which provides a 3D virtual display such that the operator at the clinician's side can teleoperate the visualization manipulator safely and intuitively; 3) Two robot-assisted endoluminal interventions are designed using two different phantoms to validate the effectiveness of robotic telepresence, one of which is also utilized in the user studies.Moreover, human motion imitation-based teleoperation is used to present the OmniAR display; and 4) A virtual marker-based estimation method is developed to calculate the superimposed error of virtual and real, and an ex vivo experiment involving a porcine lung is conducted to verify the feasibility of our proposed method.
The remainder of this paper is organized as follows: Section 2 presents the developments of the framework for robotic telepresence, including calibration, registration, shape estimation, and 3D reconstruction.Section 3 describes the realization of OmniAR and introduces the estimation method for the superimposed accuracy.Section 4 presents the results of the series of experiments conducted to validate the effectiveness of the proposed method.Moreover, the techniques and challenges of the proposed robotic telepresence are discussed.Finally, Section 5 concludes the paper and discusses future work.

Robotic Telepresence Framework
This section presents the calibrations, registration, and shape reconstruction of the flexible instrument, as well as the preoperative 3D reconstruction of an actual scene.

Framework Description
The reference frame of each component must be transformed into the monocular camera frame to overlay the virtual model and instrument onto the real world.This primarily involves a four-step calibration and registration: calibration between the visualization manipulator and EM tracking system, calibration between the visualization manipulator and rigid instrument, hand-eye calibration, and registration of the EM tracking system and anatomy.All the calibration methods are based on AX = XB, and the registration method is based on ICP algorithm. [22]Figure 2 shows the components of the robotic telepresence framework and their relative transformations.All calibrations and registrations are executed preoperatively, and shape reconstruction is performed in real time.

Calibration between Manipulator and EM Tracking System
The calibration between the visualization manipulator and the EM tracking system is transformed into a hand-eye calibration problem, which primarily solves AX = XB.First, a six degree-offreedom (DOF) EM sensor (E1) is attached to a rigid flange fixed at the end of the visualization manipulator.Subsequently, E1 is shifted in the magnetic field of the EM tracking system via teleoperation of the visualization manipulator, and the poses of the manipulator and E1 are recorded.Finally, these poses are used to construct the following equations where T End1 RB1 is the transformation matrix from {RB1} to {End1}, T E1 End1 is the transformation matrix from {End1} to {E1}, T EB RB1 is the transformation matrix from {RB1} to {EB}, and T E1 EB is the transformation matrix from {EB} to {E1}.In T End1 RB1 i and T E1 EBi , i ¼ 1, 2, where i represents a different set of the recorded data.Thus, Equation (1) can be expressed as During calibration between the visualization manipulator and the EM tracking system, E1 is fixed at the end of the visualization manipulator with a rigid flange.After this calibration is fished, E1 is removed, and the camera is fixed at the end of the visualization manipulator.Similarly, the marker is fixed at the end of the operation manipulator during the calibration of the two manipulators, and then the rigid or flexible instrument is fixed to the operation manipulator.All calibrations and registrations are executed preoperatively, and the shape reconstruction is run in real time.

Calibration between Manipulator and Rigid Instrument
The transformation matrix between {In} and {RB2} is calculated from the kinematics of the operation manipulator.Therefore, the calibration between the visualization manipulator and rigid instrument is equivalent to calibrating the two manipulators.This calibration can also be transformed into a hand-eye calibration problem.First, a monocular camera (C) and marker (M) are fixed at the ends of the visualization and operation manipulators, respectively.Subsequently, the camera and marker are shifted to different locations via the teleoperation of the two manipulators, and the marker remains in the camera's field of view.The pose of the marker is estimated by the camera using the perspective-npoint (PnP) algorithm. [24]Finally, the poses of the two manipulators and marker can be used to construct the following equations where End1 is the transformation matrix from {End1} to {C}, which is obtained from hand-eye calibration in advance, T M C is the transformation matrix from {C} to {M}, T RB2 RB1 is the transformation matrix from {RB1} to {RB2}, and T M End2 is the transformation matrix from {End2} to {M}.Thus, Equation (3) can be expressed as follows , and Subsequently, X can be solved by using Tsai's method.

Registration of EM Tracking System and Anatomy
The EM tracking system and the anatomy need to be registered to obtain the 3D pose of the anatomy with respect to {C}.The detailed registration process is as follows: 1) A CT scan of the anatomy is performed; 2) Multiple CT slices are used to reconstruct the 3D virtual model of the anatomy, and the model is used as the target point cloud, P; 3) The point cloud of the anatomy is collected using an EM sensor as the source point cloud, G; and 4) The point clouds, P and G, are matched to obtain the transformation, T A EB , between the EM tracking system and anatomy using ICP algorithm.

Shape Estimation of Flexible Instrument
The shape of the active bending segment of the flexible instrument needs to be reconstructed for the AR view.Two six-DOF EM sensors, E2 and E3, are assembled at the base and tip of the active bending segment, respectively, as shown in Figure 2. The geometry for shape estimation is shown in Figure 3. S 2 and S 3 are the tips of E2 and E3, respectively, p 2 and p 3 are the position vectors of S 2 and S 3 , respectively, and r 2 and r 3 are the direction vectors of E2 and E3, respectively.Lines l 2 and l 3 coincide with r 2 and r 3 , respectively, and can be written as follows Here, i = 2 and 3, and t i denotes the distance parameter from S i to an arbitrary point in l i .l 0 is defined as the common perpendicular to l 2 and l 3 , and M 2 and M 3 are the intersection points between l 0 and l 2 and l 3 , respectively.M 0 is the midpoint of M 2 M 3 , and t 0 2 and t 0 3 are the distances parameters of M 2 and M 3 , respectively.
Because l 2 and l 3 may not intersect, the shape estimation is based on two assumptions. [25]First, the active bending segment has a constant curvature.Second, the bending plane of the active bending segment is located on the plane determined by S 2 , S 3 , and where t 0 2 and t 0 3 are calculated from Equation ( 6), and the midpoint M 0 can be obtained.In Figure 3, S 0 is the middle point of S 2 S 3 , O is the arc center of the bending segment, and three unit vectors (s, n, and m) perpendicular to each other are defined.s is the unit vector of ⃑ S 3 S 2 , n is perpendicular to the bending plane of the active bending segment, and m is the unit vector of ⃑ S 0 O.The unit vectors are calculated as follows Based on the geometric relationship shown in Figure 3, the central angle of the active bending segment _ S 3 S 2 can be obtained as follows where R, θ, and L are the radius, central angle, and length of _ S 3 S 2 , respectively.Equation ( 8) can be expressed as In Equation ( 9), j ⃑ S 3 S 2 j is calculated in real time, and L is obtained in advance based on the physical relationship between E2 and E3.To accelerate the solution efficiency of θ, a value from 0 to 2π for θ j j ¼ 1, 2, 3 • • • ð Þ is selected at a constant interval, and the corresponding value of sin 1 2 θ j À Á = 1 2 θ j is calculated.Finally, the error is calculated as , where θ j for the minimum value of E is the desired solution for θ.After θ is obtained, R ¼ L=θ, and the position vector of O can be calculated as A local coordinate system {B} is established at O, and the unit vectors in the x-, y-, and z-directions are x B , y B , and z B , respectively, where x B is the unit direction vector of j ⃑ OS 3 j, z B ¼ n, and y B ¼ z B Â x B .An arbitrary point in _ S 3 S 2 can be regarded as S 3 rotating around z B at a certain angle γ in the plane of S 2 S 3 O, where γ ∈ 0, θ ½ .If θ > π, then γ ∈ Àθ, 0 ½ .An arbitrary point in _ S 3 S 2 can be mathematically expressed as follows where p w ð Þ is the position vector of _ S 3 S 2 at w with respect to {EB}, and w is the arclength parameter of _ S 3 S 2 .
Because of modeling and calculation errors, S 2 and S 3 may not be on an arc whose radius and center are R and p 0 , respectively.Therefore, we selected three points from _ S 3 S 2 at equal intervals, based on Equation (11).Finally, a cubic spline is used to obtain the shape p 0 w ð Þ of _ S 3 S 2 using the three points, S 2 and S 3 .

3D Reconstruction of Actual Scene
To intuitively teleoperate the manipulator and determine whether it collides with surrounding objects or not, we reconstruct the actual scene on the patient's side preoperatively and provide a VR view to the operators on the clinician's side.In the VR view, the reconstructed scene, manipulators, and rigid instrument are shown, and the poses of the manipulators and instrument are updated in real time.Operators can interact with the VR display interface to change the viewing direction, thereby improving the depth sense and avoiding view occlusion for safe teleoperation of the manipulator.The 3D reconstruction includes two steps: sparse and dense reconstructions.We use incremental SfM algorithm [26] for sparse reconstruction and MVS algorithm [27] for dense reconstruction, based on the open-source library COLMAP. [26,27]nother library, OpenMVS, [28][29][30] is used to recover the full surface of the reconstructed scene through mesh reconstruction, refinement, and texturing.The construction process for an actual scene is illustrated in Figure 4.
During the spare reconstruction, the camera pose estimated by the SfM algorithm is expressed as P ¼ P k jk ¼ 1, 2 : : : , N f g .As the scale of the estimated camera pose is inconsistent with that of the actual pose and the positioning accuracy of the manipulator is high, the pose of the visualization manipulator P ¼ Pk jk ¼ 1, 2 : : : , N È É is combined with P. The true-scale information of the camera pose is recovered via a similarity transformation [31] as follows where H r c is the similarity transformation matrix; t r c and R r c are the translation vector and rotation matrix of H r c , respectively; and s r c is the scale of the isotropic scaling transformation.Umeyama's method [31] is used to obtain H r c as follows

Omnidirectional AR and Superimposed Accuracy Estimation
To realize OmniAR, the camera must be moved to different directions for observation of objects.Because the visualization manipulator has a high positioning accuracy, it can effectively compensate for camera motion.

Anatomy-Augmented Display
After all calibrations are completed, the transformation matrix from {A} to {C} is obtained as follows where T A C is the pose of the anatomy with respect to {C}, which is updated by the pose of the visualization manipulator in real time.The following four steps were performed to overlay the virtual anatomy onto the real world: 1) A virtual 3D space that includes a virtual camera and 3D anatomical model is created using the open-source library Visualization ToolKit (VTK); 2) The intrinsic parameters of the virtual camera are set to be identical to those of the actual monocular camera, and the reference frame of the virtual camera coincides with that of the virtual space; 3) The pose of the 3D anatomy model in the virtual space is updated using T A C , and the scene of the virtual space is captured using the virtual camera in real time; and 4) The images captured by the virtual camera and real monocular camera are fused for the anatomyaugmented display.The pseudocode for image fusion is presented in Table 1.
In Table 1, cvtColor, threshold, bitwise_not, bitwise_and, and add are the open-source library OpenCV functions.The inputs of virtualImg and realImg are the images of the virtual and actual cameras, respectively.

Instrument-Augmented Display
This study uses two different setups for robot-assisted endoluminal intervention with flexible and rigid instruments.For the rigid instrument, the transformation matrix from {C} to {In} is obtained as where T In C is the pose of the rigid instrument with respect to {C} and is updated by the poses of the two manipulators in real time.After the shape estimation is completed for the flexible instrument, the shape with respect to {C} is obtained as follows where C w ð Þ is the position vector of _ S 3 S 2 at w with respect to {C}, and it is updated by the pose of the visualization manipulator and p 0 w ð Þ in real time.
If the virtual anatomy and virtual instrument are in the same virtual space, they may obstruct each other when capturing an image with the virtual camera.Therefore, another virtual 3D space is created for the instrument.The four steps for overlaying the virtual instrument onto the real world are similar to those described in Subsection 3.1.The pseudocode of the image fusion is presented in Table 2.The inputs of virtualImg and overlayImg are the images of the virtual instrument and output of Table 1, respectively.The functions listed in Table 2 are the same as those listed in Table 1.

Superimposed Accuracy Estimation Method
The source of superimposed error primarily include calibration, registration, and image fusion.Owing to the overlay result of virtual and real displays in a 2D image, estimating the superimposed accuracy in a 3D space is difficult.Therefore, we develop a virtual marker-based method to calculate the superimposed error.
A cube in which each face comprises one marker is used as the actual object, and a virtual cube model in which each face comprises one marker image is used as the virtual object.The registration process of the actual cube and EM tracking system is the same as that described in Subsection 2.4.Subsequently, the pose of the cube is transformed from {EB} to {C} using Equation ( 14) to obtain T A C .Finally, the cube-augmented display is the same as that described in Subsection 3.1.When the camera moves to different positions, pictures are captured before and after the virtual cube overlaid the real world.Because both the virtual and actual cubes have markers on each surface, the 3D poses of the virtual and actual cubes are calculated via pose estimation using the PnP algorithm.Subsequently, the positions of each corner of the virtual and actual cubes are calculated using the estimated poses.
The overlay error is calculated as follows where E overlay is the mean error of the overlay in the 3D space; n is the number of datasets, and each dataset includes eight corners of the virtual cube and eight corners of the actual cube; m = 8 is the number of corners of each cube; P v ij is the position vector in the 3D space of the i th set of the j th corner of the 3D virtual cube; and P r ij is the position vector in the 3D space of the i th set of the j th corner of the 3D actual cube.The pose of the cube in the virtual 3D space is obtained using an EM sensor and transformed from {EB} to {C}.Moreover, images of the real and virtual cubes are captured from different view directions of the monocular camera.Therefore, the method for estimating the superimposed accuracy fully considers the calibration, registration, and image combination errors.

Experiments and Results
This section introduces the experimental platforms, including the laparoscopic and airway endoluminal intervention platforms, and illustrates the result of the superimposed accuracy.Subsequently, human-motion imitation-based teleoperation was used to present OmniAR display.Two different robotassisted intervention experiments with OmniAR and VR displays were conducted, and the laparoscopic experiment was also used to carry out a user study.Moreover, the techniques and challenges of the proposed robotic telepresence system are discussed.

Experimental Platforms and Overlay Accuracy Estimation
The laparoscopic intervention experimental platform is shown in Figure 5a.A monocular camera (HD-X12MP-AF, Shenzhen Heda Technology Co. Ltd., China) was fixed to the end of the visualization manipulator (Dobot CR5, Shenzhen Yuejiang Technology Co. Ltd., China) using a 3D-printed component, and a rigid instrument was fixed to the end of the operation manipulator (LBR Med 14 R820, KUKA AG, Germany) using a 3D-printed component.The instrument was fabricated using a standard carbon-fiber tube and two 3D-printed components, which included two internal channels.An endoscopic camera, including an LED light source and an OV9734 CMOS (OmniVision Technologies, Inc., USA), was assembled into  The experimental platform for airway endoluminal intervention was similar to that for laparoscopic intervention, except for the instrument used, as shown in Figure 5b.The flexible instrument included a driven unit, commercial flexible endoscope shaft, and a commercial backbone (OBS087180, Shanghai Yanshun Scope PARTS & Accessories Co. Ltd., China).An endoscopic camera (CS1006-PM, Xiamen Micro Vison medical Technology Co. Ltd., China) was assembled at the tip of the backbone, and two six-DOF EM sensors (Aurora, Northern Digital Inc., Canada) were fixed at the base and tip the backbone.When the backbone did not bend, the two EM sensors aligned well after assembly was completed.Figure 5c shows the data flow for the robotic telepresence.The operator on the clinician's side monitored the three views of PC2 and PC3 on the patient's side using TeamViewer software (TeamViewer GmbH, Göppingen, Germany), as in our previous study. [21]The update rates of the VR and AR views were 22 Hz, and the update rate of the endoscopic view was 60 Hz.Because the computation time of the shape estimation of the flexible instrument is much shorter than that of the image processing of the virtual and real superposition, the update rate of the shape estimation depends on that of the AR view and is 22 Hz.The resolution of the input and output images in Tables 1 and 2 is 720 Â 960 pixels.
During the registration of the EM tracking system and anatomy, a rough match between both point clouds was manually executed.Subsequently, the ICP algorithm was used to match both point clouds precisely.The cube without AR and the overlay of the virtual and actual cubes are shown in Figure 5d,e, respectively.The maximum and mean overlay errors were 9.5 and 6.5 mm, respectively, and the error resulted from each step of the OmniAR realization process.

Human Motion Imitation-Based Teleoperation
A teleoperation experiment was conducted to validate the effectiveness of the OmniAR display, as shown in Figure 6a (Movie S1, Supporting Information).The method used for human motion capture and mapping to the manipulator was the same as that used in our previous study. [21]The manipulator used in this study is 6-DOF, and mapping is required only between the ends of the human arm and manipulator.
On the patient's side, an airway phantom was used as the anatomy, and the virtual airway model was superimposed onto the actual airway on the monocular camera image.On the clinician's side, the operator adjusted the arm pose to view the phantom from different directions.Figure 6b shows the phantom without AR, and Figure 6c,d shows different viewing directions of the phantom with AR.

Robot-Assisted Laparoscopic Intervention and User Study
The operation manipulator provided two-DOF rotation and one-DOF translation for the rigid instrument.The two-DOF rotation was a virtual remote center of motion, and the remote center was located at the insertion port of the abdominal model.Translational motion was along the axial direction of the instrument.The motion of the instrument was realized via teleoperation using a commercial Xbox One console (Xbox Series X & S Controller, Microsoft, USA), and the motion command was sent from the clinician to the patient via a TCP/IP connection.
The actual scene of the patient's side was reconstructed preoperatively using the monocular camera.The reconstructed 3D model, two manipulators, and the instrument were imported into the ROS visualization (RViz) environment for the 3D VR display.The laparoscopic intervention setup is shown in Figure 7a.The states of the two manipulators and instrument were updated in real time in the VR view.The operator can interact with RViz to change the direction of the display view, thereby allowing the operator to confirm a safe distance between the visualization manipulator and its surrounding objects.The task of the laparoscopic intervention was to teleoperate the instrument from the insertion port to the target position and slip the rubber ring onto the specified cylinder based on the AR, VR, and endoscopic views.The instrument was teleoperated by the operator on the clinician's side, and the viewing directions of the AR and VR could be changed by the operator or an assistant based on the operator's requirements.Figure 7b,c shows the different viewing directions for AR and VR, respectively (Movie S2, Supporting Information).
User studies were conducted using laparoscopic intervention experiment to validate the effectiveness of the proposed method.The task of the user study was to teleoperate the instrument from the insertion port to the target position and slip the rubber ring onto a specified cylinder.The user study involved five male and three female participants (mean age: 26.3 AE 4.1).The number of participants was determined based on, [32] and all participants had an engineering background.The proposed method aims to improve the spatial awareness and hand-eye coordination of the operators, not to provide medical information.Therefore, participants were not required to have a medical background.Each participant was instructed to perform the following three tasks: 1) Task I: Teleoperate the instrument with only the endoscope view; 2) Task II: Teleoperate the instrument with the endoscope and AR views, but the direction of the AR view cannot be changed; and 3) Task III: Teleoperate the instrument with the endoscope, omnidirectional AR, and VR views.
Before the tasks were executed, the participants were allowed to become familiar with the three tasks.During the user studies, all participants were instructed to perform the entire task twice, and the mean time and path length were used as objective metrics to evaluate user performance.The performance results of all participants are shown in Figure 7d-f.The mean times for Tasks I, II, and III were 83.1, 70.2, and 60.8 s, respectively, and their mean lengths were 0.538, 0.502, and 0.471 m, respectively.

Robot-Assisted Airway Endoluminal Intervention
As shown in Figure 5b, the driven unit can provide two-DOF bending and one-DOF translation for the flexible instrument.The driven unit was fixed at the end of the operation manipulator.After the operation manipulator moved to a suitable position preoperatively, it no longer moved intraoperatively.The flexible instrument was moved inside at human airway phantom via teleoperation.The endoscopic, AR, and VR views were provided as visual feedback, and the viewing directions of the AR and VR could be changed by the operator or an assistant based on the operator's requirements.The setup of the airway endoluminal intervention experiment with OmniAR is shown in Figure 8a. Figure 8b,c shows the shape monitoring of the flexible instrument with OmniAR (Movie S3, Supporting Information), and the intervention process is illustrated in Figure 8d (Movie S3, Supporting Information).
To illustrate the feasibility of the proposed method further, an ex vivo experiment involving a porcine lung was performed.Because we did not consider deformation of the anatomy in this study, the virtual porcine lung model was not superimposed onto the real world, and the 3D surface of the lung was not reconstructed for the VR view.The OmniAR only for the flexible instrument and the process of porcine lung intervention are shown in Figure 8e (Movie S4, Supporting Information).

Discussion
This article proposes a robotic telepresence based on OmniAR for remote interventional medicine.A method was developed to estimate the superimposition accuracy, the errors of which primarily resulted from calibration, registration, and image combination.However, an error is inevitable in marker-based pose estimation, which affects the accuracy of the estimation method.Since a certain distance must be set between the camera and objects to ensure that the anatomy and instrument are located in the camera's field of view, it is a low requirement for 3D reconstruction accuracy when the 3D reconstructed scene is used to determine the safe distance between the visualization manipulator and its surrounding objects.Moreover, 3D reconstruction efficiency should be increased to the maximum extent to reduce the time required for preoperative preparation.Therefore, we captured only 20 images from the monocular camera to reconstruct the actual scene, allowing the reconstruction to be completed in less than 5 min.Dynamic objects have not been considered in the 3D reconstruction, such as human bodies.Therefore, if interns or doctor assistants on the patient's side enter the workspace of our system, it is difficult to determine the safe distance information between the dynamic human bodies and our robot and to change the viewing direction of the AR.To improve this limitation, it can replace the 3D reconstruction algorithm of this paper with some other algorithms that can reconstruct an actual scene in real time.
The realization of OmniAR and 3D reconstruction of an actual scene depends on the pose compensation of the CR5.The position repeatability of CR5 is AE0.02 mm, which ensures the reliability of the proposed method.The results of the user study show that the mean time and length of Task III were less than those of Tasks I and II, which implies that our proposed method can improve the spatial awareness and hand-eye coordination of operators in robot-assisted endoluminal interventions.Delays in Internet connections for telemedicine, including 5G connections, [11] have been widely reported.Moreover, delays in Internet connections were not the focus of our study.Therefore, we did not evaluate delays in the visual feedback.The update rate of the AR view was 22 Hz, and the time consumed was primarily dominated by the image capture of the monocular camera, image capture of the virtual space, and image processing of the virtual and real superposition.If the size of the monocular camera images is reduced or the computing power of PC3 is increased, the update rate increases.With a decrease in the constant interval, the accuracy of the radius (R) and the time consumed increase.Because shape estimation is based on two assumptions, the shape construction model is not absolutely accurate.Therefore, an error in R was allowed during the solution process.To trade-off the consumed time and accuracy, the value of the constant interval to select θ was 0.05.
Human motion imitation-based teleoperation was performed to validate the feasibility of the superimposed display from different viewpoints, and two different endoluminal interventions were performed to illustrate the effectiveness of the proposed method.As the overlay error refers to every step in the realization process of the proposed method, including calibration, registration, and image combination, the error of each step does not need to be illustrated.Since we did not consider the deformation of the anatomy in this study, the virtual model of the porcine lung in the ex vivo experiment was not overlaid on the actual porcine lung.

Conclusion
In this study, we successfully developed robotic manipulatorassisted omnidirectional AR for endoluminal intervention telepresence.A framework with calibration and registration steps was implemented, and the shape of the flexible instrument was estimated based on certain assumptions.The SfM and MVS algorithms were used to reconstruct an actual scene on the patient's side using a few images, and a virtual marker-based method was proposed to calculate the superimposed error.Visual feedback from the patient to the clinician comprised three views.The viewing directions of AR and VR were changed to provide the best observational view for operators and to view the key anatomies and instruments located inside the patient's body.Human motion imitation-based teleoperation was performed to display OmniAR, and two different phantom experiments were conducted to validate the effectiveness of the proposed method.An interventional experiment was conducted for the user study.Finally, an ex vivo experiment involving a porcine lumen was successfully performed to further illustrate the feasibility of our method.In the future, the current experiment can be extended to consider the effect of anatomical deformation, [33,34] and the force between the interventional robots and tissues can be displayed by combining our method with existing approaches. [35,36]

Figure 2 .
Figure 2. Components of the proposed robotic telepresence and their relative transformations.The colored arrows indicate the corresponding transformations obtained via physical relationship (black), kinematics (green), pose estimation (blue), ICP (red), and PnP (orange).During calibration between the visualization manipulator and the EM tracking system, E1 is fixed at the end of the visualization manipulator with a rigid flange.After this calibration is fished, E1 is removed, and the camera is fixed at the end of the visualization manipulator.Similarly, the marker is fixed at the end of the operation manipulator during the calibration of the two manipulators, and then the rigid or flexible instrument is fixed to the operation manipulator.All calibrations and registrations are executed preoperatively, and the shape reconstruction is run in real time.

Figure 3 .
Figure 3. Geometric shape of the active bending segment integrated with two 6-DOF EM sensors.

Figure 4 .
Figure 4. Scene reconstruction process.The input is several monocular camera images, and the output is the 3D reconstructed scene.The scene reconstruction includes sparse and dense reconstructions, and SfM and MVS are used for sparse and dense reconstructions, respectively.For sparse reconstruction, SfM includes correspondence search and incremental reconstruction, and the manipulator's pose is used to recover the true scale information of the camera pose through similarity transformation.

Figure 5 .
Figure 5. Intervention experimental platform.a) Laparoscopic intervention experimental platform.The monocular camera was connected to a Windows PC with USB, and the two manipulators were connected to an Ubuntu PC with Ethernet.b) Endoluminal intervention experimental platform.The EM tracking system, endoscopic camera, and drive unit were connected to a Windows PC with USB.c) Data flow of robotic telepresence.d) An 80 mm cube without virtual overlay.e) OmniAR overlay accuracy estimation.A virtual cube with markers superimposed on the real world.
one of the channels of the instrument, whereas the other channel was occupied by commercial clips (ROCC-D-26-195, Mico-Tech (Nanjing) Co. Ltd., China).An abdominal model was used for the interaction phantom.

Figure 6 .
Figure 6.Human motion imitation-based teleoperation for OmniAR display.a) On the patient's side, a virtual airway phantom was overlaid onto the real world.On the clinician's side, the operator adjusted the arm pose according to the visual feedback to view the phantom from different directions.b) No AR of the phantom.c,d) Airway phantom with AR from different viewpoints.

Figure 7 .
Figure 7. Laparoscopic intervention experiment with OmniAR.a) Visual feedback from the patient to the clinician.The virtual rigid instrument and target were overlaid onto the real world.b) AR views with different directions.c) VR views with different directions.d)Time consumed by all participants in performing the three tasks.e) Length of the instrument's tip motion paths.f ) Mean time and length of the three tasks.

Figure 8 .
Figure 8. Endoluminal intervention experiment with OmniAR.a) Visual feedback from the patient to the clinician.In the AR view, the virtual flexible instrument and airway phantom were overlaid onto the real world.b,c) Shape monitoring of the active bending segment of the flexible instrument.d) Endoluminal intervention process using an airway phantom.e) Endoluminal intervention process using a porcine lung.

Table 1 .
Image combination of anatomy-augmented display.

Table 2 .
Image combination of instrument-augmented display.