A review on manipulation skill acquisition through teleoperation ‐ based learning from demonstration

Manipulation skill learning and generalisation have gained increasing attention due to the wide applications of robot manipulators and the spurt of robot learning techniques. Especially, the learning from demonstration method has been exploited widely and successfully in the robotic community, and it is regarded as a promising direction to realise the manipulation skill learning and generalisation. In addition to the learning techniques, the immersive teleoperation enables the human to operate a remote robot with an intuitive interface and achieve the telepresence. Thus, it is a promising way to transfer manipulation skills from humans to robots by combining the learning methods and teleoperation, and adapting the learned skills to different tasks in new situations. This review, therefore, aims to provide an overview of immersive teleoperation for skill learning and generalisation to deal with complex manipulation tasks. To this end, the key technologies, for example, manipulation skill learning, multimodal interfacing for teleoperation and telerobotic control, are introduced. Then, an overview is given in terms of the most important applications of immersive teleoperation platform for robot skill learning. Finally, this survey discusses the remaining open challenges and promising research topics.


| INTRODUCTION
With the rapid development of robotics, robots have been widely used in the various fields, e.g., industrial applications [1,2], medical surgery [3][4][5], real-life [6,7], space area [8] and other fields.Specifically, the robot manipulator has been widely used to perform tasks in certain and structured environments due to the advantages of low-cost, efficiency and safety.Although the manipulator has been widely used in a variety of disciplines, especially in the industrial domain, it is still difficult to perform physical in-contact tasks, for example, manipulating deformable materials [7,9,10], cooperation with humans in the same workplace [11,12], working in unknown and less structured environments [13].
As shown in Figure 1, several typical in-contact tasks, including precise assembly [1], cleaning a surface [9], robotassisted echography [14] and valve turning [15], are presented.When the robot manipulator performs such tasks, robots not only track desired trajectories but also interact with the environment physically.The challenges of these rich in-contact tasks are attaining the accurate contact model, dealing with the uncertainty of humans' behaviours, and the safety of humans etc.All of the scenarios, as mentioned above, require robots to own human-like and complaint manipulation skills.
Recently, several machine learning techniques, for example, reinforcement learning (RL) [16], imitation learning [17] and transfer learning [18], have been successfully employed in robotic skill learning.There exist some review papers to introduce and discuss these learning methods [19].Among the learning methods, the learning from demonstration (LfD) (also named programming by demonstration, PbD or imitation learning), is one effective way to transfer manipulation skills from humans to robots [20].According to the demonstration approach, LfD can be divided into three categories: kinesthetic teaching, teleoperation and passive observation [21].Compare with kinesthetic teaching and passive observation, the teleoperation could provide a multimodal interface interacting with the human.Kinesthetic teaching method enables the human to demonstrate by physically moving the robot through the desired motions.The demonstration quality of kinesthetic teaching depends on the dexterity and smoothness of the human user, and even - 1     with experts, data obtained through this method often require smoothing or other post-processing techniques.Besides, kinesthetic teaching is not applied in some extremely dangerous situations, such as nuclear plant and polluted areas, due to its requirement of demonstrators being present.Furthermore, the kinesthetic teaching requires the human teacher to work with robots in the same space; the safety of human is also a concern.However, the learning through teleoperation could solve the aforementioned issues effectively.Furthermore, as the fast development of immersive teleoperation, LfD through immersive teleoperation enables the human demonstrator to teach robots with more natural demonstrations.In this review, we will focus on robot skill acquisition through teleoperation-based LfD.
Teleoperation has been a key driver for robotic research, and it stems from the pragmatic need to perform tasks in remote environments [22,23].After the decades of development, the teleoperation technology has been widely used in various fields, for example, space exploration [24], underwater exploration [25], mobile robotics assistant [26], disaster relief [27], tele-echography [28] and surveillance due to the existence of risks to humans or unreachable physical distance.A general teleoperation system includes a human operator, master devices, communication channel, following robot and perception module etc.It can be divided into unilateral and bilateral, based on whether the perception information is transmitted to the human operator from the following robot [29].In such a system, the operator's performance can be improved by increasing the transparency of the teleoperation system [30].
Currently, the multimodal interfaces, including the virtual reality (VR)/augmented reality (AR) helmets [31], joystick [32], contact force sensors [33], bio-signal sensors [34,35], have been developed and integrated into the teleoperation system, aiming to provide immersive teleoperation and increase overall human performance.A promising direction is to combine the auditory, visual and haptic information to achieve the multimodal interaction between the operator and Illustrative examples of different contact tasks.(a) Assembly [1].(b) Cleaning an arbitrary surface [9].(c) Robot-assisted Echograph [14].(d) Valve turning [15] the remote robot to improve the comfort of operators and the performance of robots [30].Although many achievements have been done in the teleoperation, there exist several challenges realising multimodal teleoperation, for example, time delay caused by multimodal feedback, synchronisation control, different configurations between the master side and the slave side.
The paper provides a comprehensive literature review on the key technologies, applications and challenges for robot manipulation skills learning and generalisation via teleoperation-based LfD.The subsequent sections of this paper are organised as follows.Preliminaries of the teleoperation system are presented in Section 2. In Section 3, the skill representation methods for LfD are introduced.Section 4 covers the multimodal teleoperation for LfD.Section 5 provides several typical applications.Finally, Section 6 discusses future directions on manipulation skill acquisition through teleoperation-based Lf D.

| PRELIMINARIES
Generally, a multimodal teleoperation system for robot skill learning includes the following parts: human operator with interactive interfaces, a communication module, teleoperation control and skill learning and generalisation.The overview of the teleoperation system is depicted in Figure 2, and the description of each module is explained as follows.
� Multimodal interfaces module.The multimodal interface includes various devices, for example, haptic joystick, VR/ AR helmet, haptic data glove, ElectroMyography (EMG), MechanoMyography (MMG), enabling human operators to teleoperate the slave robot with immersive telepresence.In addition, the interaction interface could also gather the sensor signals for LfD use.� Communication module.This module aims to guarantee the communication between the master side and the slave side, which has a significant impact on the control system due to the time delay and data package loss.When the multimodal information needs to be transmitted, the time delay can lead to the instability of control system.� Teleoperation control module.Since there still exist some control issues in the teleoperation system (e.g.time delay, synchronisation problem and corresponding issue etc.), the advanced control frameworks and control algorithms are employed to tackle with these challenges.� Skill learning and generalisation module.The robot acquires the human-like manipulation skill through LfD.The demonstration data collected in the teaching stage will be used to train the skill model.The acquired skill of robots should be adapted to new situations within a given finite time.
As shown in Figure 3, the key technologies of manipulation skill learning via teleoperation, including skill representation, robot skill learning, immersive teleoperation and teleoperation control, are introduced.

| The introduction of LfD
LfD could make robot acquire skills from the human demonstration, without much knowledge of robotics and programming [36].It offers a promising approach to transfer and refine tasks from observation of users who are not expert in robotics and computer programming.LfD provides the novice users with an intuitive method to program robots, which we human already are used to.The difference of LfD through teleoperation has multimodal interaction interfaces comparing with the kinesthetic teaching and passive observation.LfD based on teleoperation provides a user-friendly approach to transfer the skill to robots without much knowledge of robotics and programming.LfD can be used to transfer high-level symbolic reasoning skills as well as low-level motion skills [37].There are several LfD learning strategies, such as behavioural cloning and inverse optimal control, for transferring basic motion skills and extracting the underlying objectives of optimal actions, respectively.In ref. [38], the task prioritisation issue of bimanual operation of humanoid robot was addressed by the LfD, offering the possibility to carry out more than one manipulation task at the same time.Qin et al. [39] proposed a skill learning approach based on LfD for precision assembly robot to realise effective skill transfer from teacher to the robot through several demonstrations.
In addition, the RL is a different robot skill learning framework, which allows robots to explore novel skills by trialand-error.Due to the fast development of deep learning, the RL is attaining a lot of attention from research and industry community.The benefit of robot skills learning through imitation is efficient by reducing the search space of feasible solution [40].Combing RL with LfD is a promising approach.In ref. [41], a skill learning framework integrating LfD and RL has been proposed to learn and generalise robotic skills.In ref. [42], deep RL and demonstrations has been used to learn complex dexterous manipulation.

| Skill representation in LfD
A key research aspect of the robot skill learning and generalisation is the skill representation such that it can be analysed and synthesised.Skill representation has a significant impact on the performance of robot skill learning and adaptation.Most generally, the skill representation approaches in LfD fall into two categories: the dynamical system method and the probability and statistical method.We will present detailed introductions of each representation approach in the following content.
In addition, the idea of movement primitive is often employed in the context of complex manipulation skill modelling due to its modularity and flexibility.The core idea of this representation is to decompose the complex behaviours into a set of movement primitives, which could be reassembled on demand to produce complex behaviours [43].Often, such representation can enable the skill learning and generalisation to adapt to different tasks in new situations and environments.For instance, a complex trajectory of manipulation is segmented into several movement primitives, and the dynamical system or the statistical approach then is exploited to model the movement primitives.

| Dynamic system approach
The studies on the human motion show that the motion planning and execution of human is a coupled process, and the motion trajectory is generated by the evolution of the dynamic system over time and space [44][45][46].Inspired by these works, the dynamical system approach can be used for robotic motion planning.Such as ref.[47], a set of non-linear autonomous dynamical systems were used to represent the manipulator motion, and its parameters were estimated by a mixture of Gaussians.In ref. [48], a stable estimator of dynamical systems (SEDS) based on Gaussian mixture models (GMMs) was proposed to learn the parameters of the DS to ensure global asymptotic stability at the target.This DS-based approach was employed to model various motions, such as playing minigolf [49], human handwriting motion [48].The characteristics of various methods were compared in Table 1.
Dynamic movement primitives (DMPs) is another framework to realise the movement planning, online trajectory modification for LfD use, which was originally proposed by Ijspeert et al. [50,51].Recently, it also has been used to encode different modalities, such as stiffness and force profiles.According to the type of trajectory, it can be categorised into discrete DMPs and rhythmic DMPs.Take the discrete DMPs as an example, and it can be formulated as in Equtaion (1).
where x and v demote position and velocity of the system, respectively.x 0 and g are the start and goal position, τ is a temporal scaling parameter , K is a spring constant and D is a damping term.The function f depends on the phase variable s, instead of the time.The phase variable is determined by the canonical system, which often evolves from 1 to 0. The canonical system is given by: where α is a positive gain, the initial value of s equals 1.Notice that s converges exponentially to 0.
� Advantages of DMPs.Compare with traditional means of encoding trajectories, such as spline-based decomposition, the DMPs encoding skills have a variety of benefits [52].
First, this motion representation can guarantee global stability, because, whatever the parameter of the function approximator we choose, the DMP is guaranteed to converge towards the target.In addition, the velocity of motion can be adapted by changing the time constant.The motion generated by DMPs is robust to strong external perturbations and can be modified on-line by additional perceptual variables.Furthermore, this approach also facilitates the motion modelling for multiple DoFs system.They share one canonical system among all DoFs and maintain only a separate set of transformation systems.� Limitations of DMPs.However, the original DMPs also has limitations on motion planning in some situations, for example, the goal point coinciding with the start point and the goal points distributed on both sides of the start point.Due to the explicit description of the trajectory dynamics, it introduces many open parameters as well as the basis functions and it's weighting coefficients.Moreover, it is still difficult to represent the high-dimensional trajectory of interaction tasks for redundant robots [53].When it is employed to model manipulation skills, the DMPs needs to represent the sensory signals as well as motion trajectories.These sensor profiles represent the similar, but different, to different tasks.Thus, it is hard to model the correlation between the sensory value and the states of robots.In addition, the original DMP cannot achieve the force control of robots for contact tasks, such as assembly [54].Therefore, since the original DMP was proposed, a variety of modified DMPs were proposed to tackle with limitations as mentioned earlier.
DMPs with perceptual term (DMPP) have been proposed to complete physical interaction tasks, which require robots to regulate the contact force, torque as well as the desired motion.The perception information, for example, tactile sensing and force profiles, is fundamental for these contact tasks.To take advantage of the sensory perception, a feedback term was proposed to be integrated into the DMPs model [56].The benefit of an additional feedback controller to track desired reference forces was demonstrated in grasping tasks [57].In ref. [57], the authors further extended the original DMPs for online movement adaptation using the sensory feedback.The specific DMPs enhanced by previous sensor experience for particular tasks can predict the subsequent task executions.This DMPs are adaptive and robust to the external perturbations from the environments and various uncertainty from the sensors; hence it could generate a rich set of trajectories for the complex tasks.Moreover, the feedback term can be online trained using learning techniques to reactively modify previously acquired skills [58].
Coupling DMPs: Some researchers extended the expression of DMPs model or added control method for realising obstacle avoidance, interaction with external objects and bimanual operation, a majority of which added a coupling term based on the basic model.For example, Park and Khansari-Zadeh et al. took repulsive potential fields as coupling terms into DMPs for obstacle avoidance [64,65].Hoffmann et al., motivated by biological data and human behaviours, modified DMPs model by adding an acceleration term to avoid collisions with moving obstacle [55].Composite DMPs was proposed to model both movement and stiffness features simultaneously to transfer human-like skill from humans to robots [66][67][68].The coupling DMPs owns better interaction ability than the original DMPs.
RL-based DMPs was proposed to increase the generalisation of original DMPs.In refs.[61,62], RL was exploited to learn a mapping from circumstances to meta-parameters of DMPs to increase new primitive movements.To generate new behaviours, Kim et al. applied deep RL and a hierarchical strategy to optimise and generalise the skills produced by DMPs [41].The RL technique is able to efficiently and robustly optimise the parameters of motion primitives.To further optimise the goal parameters, the path integrals algorithm was used to simultaneously optimise shape and goal parameters [63].In ref. [60], the authors proposed an augmented DMPs with a perceptual coupling, which was learned by RL.Compare with the original DMPs, the RL-based DMPs have better generalisation ability to novel situations.

| Statistical modelling
Since the statistical approaches have the benefit to deal with the inherent noise in any mechanical system, they have become increasingly popular to model robotic motion.The characteristics of various statistical methods were compared in Table 2.
GMM has been employed to model the joint distribution of input variables and demonstrated trajectories [69].In ref. [78], GMM was used to model both movement and force patterns for robot learning impedance behaviours.Usually, GMM is complemented with Gaussian mixture regression (GMR) [79] to retrieve the desired trajectory.As an extension of GMM, a task-parameterized formulation is studied in ref. [70], which in essence models local (or relative) trajectories and corresponding local patterns, therefore endowing GMM with better extrapolation performance.In ref. [32], the GMM was utilised to encode and parameterise the smooth task trajectory to realise a task learning mechanism of the telerobots.Moreover, Calinon extended the GMM to Riemannian geometry to represent robot skills for robot learning and adaptive control in ref. [71].Kernelized movement primitives (KMP): Although a number of advancements have been achieved to model the robot skill, dealing with unpredicted situations, e.g., unknown obstacles and external perturbations, and high-dimensional inputs are still challenging.Huang et al. [53] proposed the KMP, which allows the robot to adapt the learned motor skills and while satisfying various constraints in the process of task execution.Specifically, KMP is capable of learning trajectories associated with high-dimensional inputs by adopting the kernel treatment technique.In contrast to approaches relying on basis functions, its model has fewer open parameters which make the training of models more convenient .
Probabilistic movement primitives (ProMPs) is a useful skill modelling approach for robot skill learning from humans and adapts to new tasks and environments.It is a probabilistic formulation of the movement primitives that maintains a distribution over trajectories [72].The property of conditioning the motion trajectory distribution on the desired point could generalise to new tasks points.The ProMP has many good characteristics, such as blending of movement primitives, adaption to various constraints by conditioning, as well as temporal scaling and modelling the coupling between different joints.The weight of ProMPs can be learnt from the demonstration data and generalise to new tasks through probabilistic operations.The ProMPs can also deal with redundant robots' physical interaction tasks, which often needs to process various sensory data, such as force/torque [80].This method enables robots to acquire complex motor skills and coordinates the motion with the perception information.In [73], the authors used active learning approach and ProMPs to generate a set of primitive skill library, capable of modelling complex skill over a given space.However, Callens et al., pointed out the ProMPs method can predict motion over a short time horizon but struggle to predict motion over a longer horizon [81].
Hidden Markov Model (HMM) was proposed to represent the correlation between the motion state and sensory profiles by encoding a joint-probability density function over the demonstration data [76].In ref. [74], the authors proposed a method based on HMM to generate continuous motion, involving the time information for each state.Combining the HMM with a Gaussian regression technique is suitable for online recognition and continuous trajectory generation without additional time lag from pre-or post-processing of the data.In ref. [75], the framework combining HMM and GMR was proposed to generate a probabilistic model of demonstrated data.A joint probability density function between the position and the velocity is generated by using HMM, and Gaussian Mixture Regression (GMR) is used to generalise the learned skills.Also, since the demonstration from human explicitly define forces and velocities, and implicitly define stiffness as well as their underlying co-relations with the positions, which are all crucial for the robot learning.An HMMbased approach is proposed and combining GMR to generate the control variables via regression [82].
Hidden Semi-Markov Model (HSMM) was used to improve the robotic system's robustness againt external perturbations in temporal space comparing with HMM [76].In ref. [77], the HSMM-GMR model was used to model motion as well as force data for in-contact tasks.Since HSMM-GMR has been proven to be more dynamic and efficient than the vanilla HMM, it is more suitable to learn and model the correlations between the motion and other multimodal information by exploiting the collected data.In ref. [82], the HSMM and GMM were exploited to model the movement and stiffness simultaneously.

| MULTIMODAL TELEOPERATION
The purposes of exploiting multimodal interface are owing telepresence and assisting in modelling the human-like manipulation skill for in-contact tasks.A typical multimodal teleoperation framework discussed in this study is shown in Figure 4.The human operator could teleoperate the mobile manipulator, while the multimodal perception information, for example, video, audio and force, could be fed into the human side to increase the telepresence of operators.

| The design of multimodal interface
To realise the immersive teleoperation, the design of multimodal interfaces is the premise.Recently, a number of

Category
Characteristics Literatures GMM Suitable for high-dimensional input.[32,[69][70][71] KMP Suitable for high-dimensional input and multiple demonstrations.[53] ProMPs Better adaptation, but not suitable for high-dimensional input.[72,73] HMM Model the correlation between movement and sensory profiles.[74,75] HSMM Encode the duration information of each HMM state and robust to perturbation.[76,77] researchers have proposed various schemes to implement the human-robot-interaction interfaces for a variety of applications, for example, assembly, space exploration, teleoperated surgery, telerehabilitation, rescue etc.Generally, the multimodal interfaces mainly involve the haptic modality, visual modality, auditory modality and other modalities.As visual feedback is the fundamental modality, it has been well exploited to enhance the telepresence in the immersive teleoperation [84].In ref. [31], a VR-based teleoperation is implemented to improve the immersion and situation awareness for live scene exploration.Under the assistance of the deep neural network, a vision-based interface realises the end-to-end teleoperation of Shadow Dexterous hand [85,86].The VR headsets and hand tracking hardware are used to naturally teleoperate robots to perform complex tasks [17].However, only the visual feedback is unable to complete the in-contact task requiring the force control, and the haptic feedback is essential for the in-contact teleoperation.The haptic feedback has been well studied in the teleoperation.In ref. [87], the authors proposed a robotic teleoperation system with wearable haptic feedback for telemanipulation in cluttered environments.Moreover, the haptic interface is also employed in the precise telemanipulation, such as the surgical robot, micromanipulation, micro-assembly.In addition to the vision and tactile touch, the auditory information is also utilised to localise the sounding object [88].
Although the unimodal feedback can complete basic tasks in structured and predictable situations, combining these diverse modalities to deal with complex contact task is essential, which is gaining increasing attention of researchers.In ref. [89], an enhanced teleoperator interface incorporating multimodal augmented reality is proposed to address the dexterous manipulation of heavy materials.Although some achievements have been achieved in the immersive teleoperation, there still exist several challenges to accomplish the multimodal teleoperation due to many factors such as the effects of time delay caused by the communication link, the requirement of high packet rate in the real-time control loop, and the synchronisation of different modalities.

| Improving telepresence of teleoperation
The multimodal interfaces in teleoperation aim to provide immersive solutions and increase overall human performance [90].In this review, we focus on the bilateral teleoperation, where the multimodal information feedback could transmit to the operator to improve the telepresence.Extensive comparisons [30] have been done to show that regardless of task complexity, using multimodal interface could improve the performance.Research in cognitive psychology also suggests that utilising multisensory stimuli enhances human perceptual 4 The structure of the multimodal teleoperation system adapted from ref. [83] SI ET AL.
-7 25177567, 2021, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12005 by UNIVERSITY OF ESSEX, Wiley Online Library on [11/10/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License learning [91].Indeed, when we learn from others, we utilise a variety of multimodal information, including verbal and nonverbal cues, to make sense of what is being taught.
Teleoperation systems enhanced by haptic feedback enable human demonstrators to perceive the remote environment and the robot interacting with the environment.The high requirement of packet rate and stability are challengeable for the teleoperation system with haptic feedback under the time delay of the communication module.Therefore, several strategies on the integration of communication and control have been proposed to deal with the above issues.Such as [83], reducing the haptic data and stability-ensuring control strategy is used to guarantee teleoperation's stability for practical tasks under the time delay of the communication.In ref. [92], point cloud-based model-mediated teleoperation with dynamic and perception-based model updating was proposed to achieve the stable and transparent teleoperation in the presence of communication delay.

| Collecting demonstration data for skill learning
In addition to increasing the telepresence of human operator, the multimodal interface contributes to the high-quality demonstration that is essential to LfD successfully.For some scenarios such as car driving [93,94] and helicopter control [95], it is much easier to implement demonstration data collection since the intuitive operation interfaces for human demonstrators exist.However, it is hard to collect high-quality demonstration data for manipulators because of the correspondences between the demonstrator's operational spaces and the robot configuration [21].
VR-based teleoperation allows for a direct mapping of observations and actions between the teacher and the robot, which does not suffer from the correspondence issue [31].VR headset was utilised to perceive the environment through the robot's sensor space, and a motion-tracked VR controller was adopted to control the robot in a way that leverages the natural manipulation instincts that humans possess.Moreover, the haptic devices have been an effective interface tool for LfD.
In ref. [96], the joystick was used to control remote Baxter robot, enabling humans to sense the contact torque and force.For the shared control mode, it is fundamental for the user to receive an appropriate sensory feedback informing about the feasibility of her/his commands against the slave system constraints.To achieve this, a haptic guidance method, which informs the operator about constraints acting on the teleoperation system, needs to be designed.
When the robot performs in-contact tasks, which requires compliant manipulation, the touch and visual information are significant.Lee et al. [97] pointed out contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback.They use self-supervision to learn a compact and multimodal representation of sensory inputs to improve the learning efficiency.In ref. [98], the multiple contact modalities are significant to the reactive manipulation skills.In ref. [99], the problem of cross-modal visuo-tactile object recognition was proposed to improve the objection recognition performance.
Currently, the bio-signal sensors, for example, EMG and MMG, have been exploited in LfD.Traditional learning and generalisation methods have not well considered human impedance features, which makes the skills less human-like and restricted in physical human-robot interaction scenarios.Yang et al. [35,66,67] develop a framework that enables the robot to learn both movement and stiffness features from the human tutor.In ref. [100], multiple sensor data has been encoded for robot skill learning to achieve multimodal demonstration learning.In ref. [66], EMG signal was utilised to estimate the stiffness of human arm in LfD to achieve the human-like skills transfer from humans to the robot.Learning an in-contact task, for example, pushing something, where the constraints of both position and force have to be satisfied, is usually difficult for a collaborative robot.In ref. [101], a multimodal teachingby-demonstration system was proposed, which enable the robot to perform the force-dominant tasks.

| Teleoperation control
The stabilisation of multimodal teleoperation system is essential to achieve manipulation skill learning through teleoperation.There exist control issues for the bilateral teleoperation to assist the human-like manipulation skill learning, for example, teleoperation control, manipulation control.A large number of control methods were proposed to enhance the performance of teleoperation system [102,103].
According to control mode, the teleoperation system can be divided into three categories: direct control, supervised control and shared control [104].For the direct control mode, the slave robot is controlled by human operator directly without autonomous abilities.When the robot works in the supervised mode, it executes the tasks according to the pre-programmed code, in which human merely supervise the execution process.However, the shared control is a hybrid strategy, combing the direct control and supervised manipulation, in which the human operator collaboratively work with robots based on a mechanism.The shared control framework has been well studied in human-robot shared manipulation [105][106][107][108], robot skill learning through teleoperation [109].For instance, in ref. [96], a hybrid-shared control method based on EMG and haptic device is proposed to telecontrol the mobile robot's motion and achieve obstacles avoidance.Similarly, in ref. [110], a human robot shared control strategy is developed to realise the autonomous obstacle avoidance.
Shared control has proved to be an efficient method for designing intuitive robotic teleoperation interfaces for human operators, which could reduce human operators' workload when they carry out complex tasks.Shared control in teleoperation system makes it possible to share the available degrees of freedom of a robotic system between the operator and an autonomous controller, to facilitate the task for the human operator and improve the overall efficiency of the system.Take a robotic cutting example, it has a high requirement of dexterity and safety.For example, Prada and Payandeh used geometric virtual fixtures to assist with robot cutting.Besides, the shared control strategy has been employed to obstacle avoidance, in which the human operator only needs to consider the motion of the end-effector of the manipulator [110].Moreover combining the shared control method with the EMG sensor has been proposed to enable human to teleoperate a mobile robot and achieve obstacle avoidance simultaneously [96].The force feedback based on muscle activation can be transmitted into the human to update their control intention with predictability.In ref. [111], a passive task-prioritised shared-control method for remote telemanipulation of redundant robots was proposed.Haptic feedback and guidance have been shown to play a significant and promising role in shared-control applications.Haptic cues can be used to increase situation awareness and to effectively steer the human operator towards the safe execution of some tasks.
Impedance control is a significant control architecture when robots need to interact with environments or human physical or respond appropriately to unforeseen perturbation.The impedance even can be adjusted based on various tasks.The variable impedance control is well studied to deal with the in-contact task under less predictive and structured environments.Hogan initially studied the impedance control for manipulator [112], and since then, a number of improved methods were proposed to deal with various challenges of robotic control.In addition, Yang et al. proposed a human-like learning controller to achieve variable impedance when robots interacting with unknown environments [113].In ref. [114], the authors studied the stability considerations for variable impedance control.Kronander and Billard studied the online learning of varying stiffness when robot learning skill through Lf D [115].

| APPLICATIONS
Over the last decade, since the immersive teleoperation provides a number of advantages mentioned above, the multimodal teleoperation system has been widely exploited in different fields.In addition, the immersive teleoperation provides an intuitive and flexible interface to transfer the complex manipulation skills, especially robot performing in-contact manipulation tasks under uncertainty and less structured environments.Many researchers have proposed various intuitive and flexible teleoperation platforms to realise the robot skills learning and generalisation, as shown in Figures 5-8.In this section, we will introduce several typical applications of the multimodal teleoperation in various domains, for example, assembly, rehabilitation, palpation.

| Medical field
As shown in Figure 5, robots learn palpation skill through human interaction with a haptic device, the surgeon console.
Although the teleoperation originally was designed to address the industrial nuclear wastes and space explorations, it has been widely utilised in medical surgery.Telemedicine diagnosis and telesurgery enable the remote and poor areas to access the start-of-the-art medical resources in developed countries.Especially, in the current situation of pandemics, the potential roles of robotics are becoming increasingly clear.Teleoperation provides feasible solutions for the remote dexterous manipulation in the medical fields.Specifically, three areas, logistics (e.g., handling of contaminated waste and delivery), clinical care (e.g., decontamination and telemedicine), and reconnaissance (e.g., monitoring compliance with voluntary quarantines) are identified to make a difference for robot application in the medical domain.From a technical point of view, to achieve the dexterous manipulation of robot manipulator in the above applications involves skill learning, robotic control, sensing the environment, decision making etc.Although these applications are being actively explored, the actual situation is still far from the expected scenes.In this context, the teleoperation provides a feasible and effective solution to tackle with these challenges.
After the decades of developments of the medical robot, a number of start-of-the-art teleoperation platforms for surgery have been developed.For instance, the da Vinci robotic surgical telemanipulator is a mature and commercialised surgical platform, and it has been utilized in several surgical specialties for varied procedures.It is reported that more than 5000 da Vinci robotic surgical systems are installed and nearly 6 million procedures performed by the end of 2018 [116].The da Vinci system used in the clinic is not equipped with haptic feedback, although the haptic feedback is significant to improve the teleoperation performance [117,118].However, researchers try to use the force/torque sensors to detect the interaction force to provide the surgeon's haptic experience [116].Besides, a deformation tactile feedback device is developed to provide haptic to the teleoperators, which can be integrated into the da Vinci surgical teleoperation system [119].Its effectiveness of improving telemanipulation performance has been evaluated by comparison experiment; 20 participants carried out manipulation tasks using deformation tactile feedback, force feedback and the combination of both feedback respectively.The performance of teleoperation with all feedback is better than the one without haptic feedback.
To further increase the teleoperation performance of surgical robotics, Su et al. proposed an improved human-robot collaborative control scheme, based on a hierarchical operational space formulation of a seven-degree-of-freedom redundant robot, to provide a compliant behaviour for the medical staff [3,120].

| Industrial field
Robotic assembly has been widely exploited in manufacture due to the efficiency, safety and low-cost [121], however, achieving highly precise assembly and performing tasks under unpredicted situations is still open.The robot learning assembly skill through teleoperation is shown in Figure 6.A dexterous teleoperation interface based on haptic and visual feedback was proposed to precisely control and manipulate micro objects [122].In ref. [123], the authors proposed an intuitive teleoperation system with haptic and visual feedback to realise the telemanipulation of microspheres (with a diameter of less than 2 um) between France and Germany.The visual feedback is used to derive the relative positions between the objects and the tools from the scene, while the relative information is transmitted through the haptic feedback.Further, due to limitations of the visual feedback, Bolopion et al. implemented a haptic interface to realise the 3-D micro assembly of spherical objects [124].
Recently, combining the machine learning techniques and the multimodal feedback system has been used to realise the robotic skill learning.In ref. [39], a skill learning approach for precision assembly was proposed to realise efficient skill transfer from human to robot through the force and visual feedback.In order to transfer the humanlike manipulation skill, the modulation of human impedance is essential to deal with tasks under unpredictable and unstructured environments.Therefore, in ref. [125], a human-in-the-loop approach based on a stiffness control interface is proposed for robots to learn assembly tasks in unstructured environments.As shown in Figure 6, this approach combines the end-effector force feedback with an interface controlled by the human finger for modulation of the robot end-effector stiffness.Two assembly tasks, sliding a bolt fitting inside a groove and driving a self-tapping screw into a material of unknown properties were conducted to validate the superiority of this skill learning approach based on multimodal feedback.It should be noted that multimodal feedback is essential for robot manipulators to transfer micro manipulation skill.In this regard, multimodal teleoperation is a promising approach for robotic skill learning.

| Tele-rehabilitation
Stroke is becoming increasing prevalent throughout the world, and rehabilitation training is especially important in poststroke care.For instance, Baek et al. proposed a wireless active finger rehabilitation approach based on electromagnetic manipulation for hand rehabilitation [126].A bilateral rehabilitation training scheme based on the fusion of visual and haptic feedback enables the patient to involve in the rehabilitation training actively [127].Recently, the telerehabilitation has gained an increasing attention, as it allows a physical therapist to rehabilitate a patient who is far away from the physical therapist.In ref. [128], the telerehabilitation system with motor-assisted device is developed, and the physical therapist verifies the condition of the patient by image or data information.To enhance the telepresence, a bilateral telerehabilitation system with visual and haptic interfaces is used to rehabilitate the human lower limb [129].As shown in Figure 8, the leap motion sensor tracks the motion of a healthy hand, and the Omega.7 device is used to assist the impaired hand with force feedback.

| Rescue and search
Robots have significant advantages over the human for complex tasks in dangerous environments.The rescue and search environments are often dangerous and uncertain; there is a risk to their lives if they enter.For instance, the Fukushima nuclear accident requires robots to work in an unstructured and uncertain environment, and humans cannot enter as the radiation and toxic contamination.In ref. [130], a mobile manipulation robot Momaro has been developed and evaluated in the DARPA robotics challenges.A teleoperation of a rescue robot has been developed with a gamepad and images from a camera mounted on a robot [131].

| Manipulation skill modelling
Since the existing encoding methods aim for the structural data modelling, representing the multimodal demonstration data simultaneously is still open.To end this, the deep neural network technique is a potential approach.In addition, the demonstration data is often characterised by varied geometries, such as angular velocity, stiffness and force profiles.It is still difficult to encode these heterogeneous data.One potential approach is to introduce domain knowledge into the corresponding models.The framework of Riemannian geometry may be a promising direction to address this issue [71].Riemannian manifolds are a powerful tool to represent rigid-body orientations, inertial matrices, manipulability ellipsoids or controller gain matrices through exploiting the geometry of non-Euclidean spaces.

| Skill learning through multimodal teleoperation
The synchronisation of multimodality is significant in the multimodal teleoperation.If signals of different modalities are out-of-synchronisation, overall spatial and temporal immersion is reduced.Another challenging aspect of utilising multimodal demonstrations is users' comfort and accessibility.It is not clear how to acquire highly multimodal demonstrations by placing an overwhelming number of sensors without burdening the user.Effectively collecting multimodal demonstrations from remote users also remains challenging [30].
The existing methods for teleoperation based Lf D are limited to learning from a small number of pre-specified modalities.To effectively learn a wide variety of complex skills, we need methods that reason over demonstrations in multiple modalities, identifying the most relevant demonstrations, and learn from them.The research of how the multimodal information influences the learning performance is still open.
Another challenge is to transmit multimodal signals, which require high bit rate to teleoperate remote robots.For instance, the haptic feedback is significant for the contact tasks.

| Skill generalisation
Since the working environment is often different and the range of possible tasks that the robot needs to perform is infinite, it is impossible to teach robots all manipulation skills through LfD.When the robots work in less-structured environments, the robots need to react in a smooth and fast manner to various perturbations.In this case, it needs to modulate the movement with respect to the situation, instead of re-planning the whole trajectory.Thus, the robot should own the capabilities to cope with novel situation by online learning and adaptation.Chatzilygeroudis et al. [40] proposed a 'micro-data reinforcement learning', where a robot adapts with only a handful of trial and a few minutes.
In addition, the generalisation of movement primitives to different tasks comes from two sources: the individual movement primitive and the combination of movement primitives.The generalisation of each movement primitive arises from integrating the perception into the active planning module.The combination among the movement primitive library generates complex manipulation planning for unseen situations.

| CONCLUSION
In this review, we focused on multimodal teleoperation based Lf D to realise the manipulation skill transfer from humans to robots.First, the multimodal teleoperation system for Lf D, including the human demonstrator, multimodal interfaces, remote robots, communication module, robotic control module and remote perception module, was introduced.In order to encode the multimodal demonstration data, we summarised the skill modelling methods, including the dynamic system and statistical method.In addition, to achieve the complex manipulation skill transfer from humans to robots, the multimodal interface plays an important role to enhance the telepresence, improve the performance of demonstrators and gather the demonstration data.We further discussed the design of the multimodal interface and how to integrate it with LfD.Several typical applications of skill acquisition through the multimodal teleoperation were also presented.Finally, we provided the remaining challenges and future work in terms of skill modelling, multimodal teleoperation and skill generalisation.

F I G U R E 5
Robot learning the palpation skill adapted from ref. [101].(a) Setup for the manipulation of the silicone sample.(b) Human interact with the haptic device.(c and d) The task environment through the surgeon console The structure of the teleoperation system SI ET AL.Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12005 by UNIVERSITY OF ESSEX, Wiley Online Library on [11/10/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License