Whole‐Body Multicontact Haptic Human–Humanoid Interaction Based on Leader–Follower Switching: A Robot Dance of the “Box Step”

For physical human–robot interaction (pHRI) where multi‐contacts play a key role, both robustness to achieve robot‐intended motion and adaptability to follow human‐intended motion are fundamental. However, there are tradeoffs during pHRI when their intentions do not match. This paper focuses on bipedal walking control during pHRI, which handles such tradeoff when a human and a humanoid robot having different footsteps locations and durations. To resolve this, a force‐reactive walking controller is proposed by adequately combining ankle and stepping strategies. The ankle strategy maintains the robot's intention based on an analytically‐optimal center of pressure, leading the robot to oppose resistance to multiple contacts from the human. Based on the robot's kinodynamic constraints and/or the confidence of the robot's intention, the stepping strategy updates the robot's footsteps based on the human's intention implied by the multiple contact forces. Consequently, the proposed walking control on pHRI mutually exchanges human–robot intentions in real‐time, thereby achieving coordinated steps. With a full‐sized humanoid robot that is able to detect multi‐contacts in real‐time, we succeeded in performing a long‐term “box‐step” with multi‐contacts pHRI, demonstrating the robustness of our approach.


Introduction
The development of robots subject to physical interactions is one of the most challenging robotic research topics for the future robotic society. Such robots would greatly expand their applicable area, e.g., physical power assistance [1,2] ; clinical gait training for the elderly [3] ; and/or teaching amateurs dance steps. [4,5] In general, on physical humanÀrobot interaction (pHRI), robots have their own purpose (e.g., promotion of physical fitness for the elderly), whereas users also have requirements (e.g., easy walking for users). As can be expected, they are often tradeoffs and, sometimes, in conflict. Hence, a robot for pHRI should consider the user's intention appropriately while satisfying the robot's intention in the best possible way. The balance between the humanÀrobot intentions would be determined according to the mutual exchanges of information through the pHRI.
As the first step for pHRI, multiple contact forces applied by a human (and/or some objects) have to be precisely sensed. General robots, however, only have six-axis force and torque sensors mounted on their end-effectors. [3,5,6] This fact significantly limits the areas where the contacts can be exerted. Even if torque sensors (or estimators) are mounted on all the joints, [7][8][9] the contact positions are difficult to be localized, especially with multiple contacts.
To sense all the contacts and their positions on robots, several research groups have developed robotic skin to cover the bodies of robots. [10][11][12][13][14] In particular, the skin cell that we have developed [13] can be localized automatically using an embedded accelerometer, [15] and we measure the contact force and precontacts using a proximity sensor, which can be directly converted into joint torques. By covering the whole body of a humanoid robot with more than 1000 cells over %1 m 2 of the robot (see Figure 1), the robot can sense multiple contact forces, making multicontact pHRIs tractable. Note that, in this Review, all the forces are transformed into a net wrench with respect to the center of mass (COM) to simplify the walking control.
An important next step for multicontact pHRI is motion control according to the sensed tactile information that must be considered. If robots are limited to manipulators and mobile robots with natural stable dynamics, the contact forces can be distributed to the joint and task spaces, where respective virtual impedances are predefined or adaptively given, without any concern on failure (e.g., such as falling). [3,7,16] Even in the case DOI: 10.1002/aisy.202100038 For physical human-robot interaction (pHRI) where multi-contacts play a key role, both robustness to achieve robot-intended motion and adaptability to follow human-intended motion are fundamental. However, there are tradeoffs during pHRI when their intentions do not match. This paper focuses on bipedal walking control during pHRI, which handles such tradeoff when a human and a humanoid robot having different footsteps locations and durations. To resolve this, a force-reactive walking controller is proposed by adequately combining ankle and stepping strategies. The ankle strategy maintains the robot's intention based on an analytically-optimal center of pressure, leading the robot to oppose resistance to multiple contacts from the human. Based on the robot's kinodynamic constraints and/or the confidence of the robot's intention, the stepping strategy updates the robot's footsteps based on the human's intention implied by the multiple contact forces. Consequently, the proposed walking control on pHRI mutually exchanges human-robot intentions in real-time, thereby achieving coordinated steps. With a full-sized humanoid robot that is able to detect multicontacts in real-time, we succeeded in performing a long-term "box-step" with multi-contacts pHRI, demonstrating the robustness of our approach.
of humanoid robots with unstable dynamics, quasistatic tasks can be resolved with similar approaches, e.g., a slow standingup motion. [17] The whole-body controller with explicit objectives considers physical interactions with humans or objects. [18] That is, by adjusting impedances and suitable objectives, the balance between the humanÀrobot intentions can be accommodated.
In contrast, bipedal walking is generally classified as a fast dynamic system without any stable equilibrium points. Without focusing on walking control during pHRI, advanced control methods based on simplified dynamic models have been proposed to keep walking balance and hold the robot's walking intention (i.e., footstep locations and durations). [19][20][21][22] Note that this Review exploits a concept of a divergent component of motion (DCM) proposed in another study [21] as well as the conventional works. [23][24][25][26][27] Therefore, they regard the contact forces for pHRI as disturbances and resist them robustly but have no capability for adapting to human walking intention based on multicontact forces. Here, we note that intention in the case of walking is defined to be the direction in which one wants to walk, footstep locations, and the duration for each step.
Bipedal walking explicitly accounting for multiple contacts can be considered as the next major challenge in humanoid robotics and pHRI. The controller proposed by Nishiwaki et al. [20] compensates the external forces at the wrists (with the six-axis forceÀtorque sensors) by providing the offsets to reference signals. The joint research group of AIST and CNRS has succeeded in making a humanoid robot carry a board together with an operator by combining the impedance control for the upper body and walking control for the lower body. [28][29][30] In that case, the interaction would be limited due to the indirect force-reactive walking control. In addition, the robot in this situation is merely regarded as an operator's follower, which only reacts to human intention. The study of Agravante et al. [6] proposed a model predictive control, by directly handling the external wrench on the COM with two types of cost functions: as the leader (to hold the robot's intention) and the follower (to adhere to the human's intention), respectively. However, the interaction models of the leader and the follower are differently given; hence, no intermediate role is given. In addition, in all of the earlier studies, the prediction and control of the external force are relatively easy because the contact points are limited to end-effectors. When multiple contacts with the whole body are allowed, direct control of the external force becomes difficult due to inaccurate prediction. We believe that the next challenge in pHRI, the robot and a partner, who is in contact with the robot, should convey their intentions using physical cues during physical interactions with each other through multiple contact points, and according to the interactions, the robot uniformly decides whether to be the leader or the follower.
Hence, our project focuses on multicontact pHRI walking, where the leader or the follower is nondiscretely given the robot's role. As the first step, this Review proposes a new force-reactive walking control to achieve multicontact pHRI tasks that require walking skills like dance, as shown in Figure 2. In this controller, the robot dynamics are represented by DCM with the explicit input of the external forces, which are measured by the robotic skin cells and propagated to the COM. Under these DCM dynamics, an ankle strategy controls the balance of walking for holding the robot's intention. To follow the human intention, a stepping strategy is designed, so that the robot's intention is smoothly updated and reactive to the external forces. Here, we consider a concept, the "confidence" of the nominal footstep locations and durations (as the robot's intention), to make it possible to adjust the ratio between whether the robot holds the robot intention or follows the human intention. The robot has to estimate the nominal footstep commands, some of which should be conducted accurately (e.g., the commands to walk on stairs), and some of which may be uncertain (e.g., the commands estimated from noisy sensing data). The "confidence" is defined as the quantity of the classification of the nominal footstep. Based on this "confidence," the robot can behave as the leader or follower of the partner during pHRI in a uniform manner. The "confidence" also implicitly adjusts the contact force with the partner, which is difficult to be accurately predicted and controlled because of the arbitrary multipoint contacts.
Specifically, the ankle strategy provides the analytical position of ground reaction force derived from the DCM dynamics to keep the footstep location and duration within a target range. The allowable area of the ground reaction force is limited within a support polygon, and its size can be shrunk virtually according to the confidence value to easily activate the stepping strategy. Only when the ankle strategy cannot provide its analytical solution, the stepping strategy numerically solves two optimization problems during the single support phase (SSP) and the double support phase (DSP), named, SSP-Opt and DSP-Opt, respectively. Here, these two are designed to achieve three objectives: 1) to guarantee the DCM dynamics considering the contact forces as the partner's walking intention; 2) to keep the nominal parameters as the robot's walking Figure 1. A full-sized humanoid robot, named H1, covered by robotic skin (%1 m 2 of the covered surface): all the contact forces are measured by skin cells; these forces can be transformed into a net wrench with respect to the COM, which is convenient to generate walking motions. Reproduced with permission. Copyright 2019, A.Eckert/TUM. intention; and 3) to smoothly change the optimal parameters in practice. The confidence is utilized to assign the contribution weight for each.
To verify our method, three types of real experiments with a full-sized humanoid robot covered with artificial sensing skin were conducted: 1) We first confirm the appropriate combination of the ankle and stepping strategies through experiments for pushing from in four directions (i.e., forward, backward, rightward, and leftward). Even when pushing from any direction, the stepping strategy is activated only when the ankle strategy exceeds its limitation to keep balance during walking.
2) Next, we conduct trials of robotÀrobot interactions (i.e., between a humanoid robot and a mobile robot). We show that the humanoid robot can switch its role between a leader and a follower according to "confidence." 3) Finally, we conduct stepping demonstration during multicontact pHRI. The robot first tries a "box-step" dance with a small stride. After that, the human tries to achieve the "box step" with a larger stride by pushing the robot, and consequently, the robot succeeds in increasing its stride for the "box step" by following the forces applied by the human.
In our earlier work, we examined the ankle and stepping strategies for footsteps of the robot to adjust through pHRI, while maintaining stable motion. [27] Here, we further examine the effects of rotation produced by the contact forces and the realtime updates of swing-leg trajectory. In addition, to investigate the mean of whole-body haptic interactions, we introduce the "confidence" of the nominal footstep and integrate it with a framework for walking control that supports multicontact pHRI. The "confidence" defines the role of the robot as a leader or as a follower during whole-body haptic interaction. This is conducted by integrating the ankle and stepping strategies with the "confidence" as their priorities, which heuristically amplifies the parts of design parameters in the respective strategies by taking those functions into account. Thus, the framework can switch the role of the robot between the leader and follower of the partner in a uniform manner.
The performances of our method are investigated through three types of experiments in terms of 1) the robustness and adaptability to disturbances from any direction, 2) the capability to switch the robot's role between leader and follower according to the "confidence," and 3) the practicality on multicontact humanÀrobot interaction. Only nominal footsteps with "confidence" is changed for each experiment, indicating that the robot's behavior (i.e., leader or follower) can be determined only by adjusting the "confidence." The remainder of this Review is structured as follows. Section 2 simplifies the whole-body robot dynamics for multiple contacts to DCM dynamics with a net wrench at the COM. Section 3 derives the ankle strategy from the DCM dynamics analytically. Section 4 defines two optimization problems, which are solved numerically in real time, for the stepping strategy during SSP and DSP, respectively. Section 5 shows the three types of real experiments. Section 6 discusses the limitations of the proposed method and suggestions to overcome them. Section 7 concludes this article with a summary and potential future work.

Whole-Body Dynamics with Multiple Contacts
The rigid-body dynamics of a humanoid robot with multiple contacts are given as follows.
where M denotes the inertia, G includes the gravity, centripetal, and Coriolis forces, and K is the transformation and propagation matrix, respectively. As a bipedal humanoid robot is a floating base system, its world coordinate q 0 has to be considered in addition to its joint angles θ and their torques τ. As shown in Figure 1, our humanoid robot, H1, [13,27] has 1260 skin cells to cover the whole body. Theoretically, the same number of contact terms with their respective wrenches ω i is needed. Thus, it is computationally infeasible if such complicated dynamics are used to directly control the robot. The proposed walking control framework: the walking intention of the human partner generates multiple contact forces measured by the robotic skin; using the contact forces and the original/predicted walking intention of the robot/partner, the ankle and stepping strategies adapt the footstep and command it as the new walking intention of the robot; according to this new footstep, the human partner adjusts his/her walking intention; note that in this article, the walking intention of the partner is not predicted, and the original walking intention of the robot is given as commands.
www.advancedsciencenews.com www.advintellsyst.com To simplify such complicated dynamics, let us introduce point-mass dynamics, which is well known for controlling bipedal walking such as the linear inverted pendulum model [19] and the DCM. [21] In that case, we can consider only the propagated forces to the floating base (i.e., the COM in the model). That is, the robot with multicontact pHRI is modeled as the point-mass model by only propagating all the contact forces onto the robotic skin cells to the COM. As a result, the control problem with multicontact pHRI can be solved in a computationally feasible manner.

Multiple Contact Forces Sensed by the Robotic Skin
As mentioned earlier, the robot has 1260 skin cells, all transformation matrices T (consisting of rotation matrices R and translation vectors p), which are identified by the calibration technique. [13] Each cell has five modalities for sensation and indicates the cell's state: 1) three normal force sensors; 2) one proximity sensor; 3) a three-axis accelerometer; 4) one thermometer; and 5) one LED to show the contact state. The sampling rates of these sensors are 250 Hz for acceleration, proximity, and temperature and 2 kHz for the force sensors. To minimize the communication latency, the communication paths between the robot (i.e., the host computer) and all the skin cells are optimized, [31] resulting in less than 1 ms of communication latency. These measurements are managed on the host computer with a strictly clock-driven control algorithm using real-time OS and ros_control, [32] the effectiveness of which has been demonstrated in our other works. [33,34] Even so, a heavy burden on traffic is expected if these all measurements are sent to the host computer simultaneously, and that could break the real-time control of the system. To overcome this problem, an event-triggered communication system (i.e., only the activated cells send their data) has been developed. [35] Furthermore, the force sensors and the proximity sensor are used to handle pHRI as a pseudoforce; hence, the other modalities are deactivated.
The skin cells implement a change detection regime. In each skin cell, a change detector monitors the sensor signals of the skin cell and creates events (updates on the state of the sensory signals) when a sensor value changed. Thereby, the change detector reduces the temporal redundancy of information sent to the controller. The transmission rate and the computation power for providing tactile feedback of large-area e-skin decrease significantly. At the same time, the information conveyed by the events is practically identical to the information conveyed in the samples of clock-driven systems. Controller implementations that take the feedback of event-driven e-skin systems can either decode the events back to samples or follow the traditional control regimes. Alternatively, controllers can exploit the event-driven information representation to increase their computational efficiency by reducing the number of calculations. In both cases, the controller can strictly realize a constant control frequency. Thus, the event-driven representation of the tactile feedback does not negatively affect control performance, it rather improves its realtime performance by reducing computational demands.
In this Review, we take advantage of this property. Specifically, from the sensor values (for the force and proximity information), we can easily find the set of activated skin cells, ℕ ac , as follows: wherev i,force and v i,prox denote the values transmitted from the three force sensors (actually, the mean of them) and the proximity sensor on i-th skin cell, respectively. By taking the respective thresholds (i.e., ϵ force and ϵ prox ), the large computational cost for the following transformation can be reduced.
In addition tov i,force , the system can detect precontact using v i,prox as a pseudoforce, which can improve the response speed. The pseudoforce f i,cell applied to the i-th activated skin cell is given as the weighted sum ofv i,force and v i,prox .
where w force and w prox are respective weights calibrated empirically.
Here, as shown in Figure 3, we suppose that all the pseudoforces of all the activated skin cells ℕ ac are propagated to the COM (more accurately, the base link of the robot) and are applied to the DCM dynamics explicitly. As mentioned earlier, we know the transformation matrix from the COM to i-th skin cell's frame, T i with R i and p i , and therefore, the net applied wrench where f i,cell ¼ ½0, 0, À f i,cell Т is the applied force vector to the i-th skin cell's frame. F skin and τ skin are fed to the walking controller, as described in the next section. Note that τ skin rotates the upper body, which has a feedback controller to preserve the vertical posture.

DCM Dynamics with Applied Force
Let us introduce the DCM dynamics with explicit contact forces except feet (see Figure 4). Given the COM position x with a point mass m and the natural frequency ω 0 ¼ ffiffiffiffiffiffiffi ffi g=h p with g as the gravitational acceleration and h as the COM height, the DCM position ξ, which exhibits the unstable walking component, is derived as follows. www.advancedsciencenews.com www.advintellsyst.com Using this formula, the robot dynamics is simplified as two first-order linear differential equations.
where u is the so-called virtual repellent point (VRP). Note that the ground reaction force from the supporting leg(s) is implicitly included in this dynamics. Provided by the robotic skin (i.e., F skin and τ skin ), a part of u related to the contact forces except feet can be explicitly distinguished as u ω , which is further decomposed into u F for the translational force and u τ for rotational effect. That is, the following formulae are additionally given to the dynamics.
where ν denotes a hyperparameter to determine a virtual robot's mass similar to the general impedance control. It is desirable to set ν ≤ 1 for fast response to the physical interactions, although ν ! ∞ makes the dynamics with the explicit contact forces converge on the original DCM dynamics without the contact forces. u ref is the reference VRP to control the DCM (and the COM) and corresponds to one of the walking intentions, footstep location.
In addition, the additional acceleration by the contact forces except feet is described with the virtual robot's mass.ẍ This will be directly added to Equation (7) using Euler's method.
Using these dynamics, Kalman-filter-based state estimation (e.g., see another study [36] ) is conducted to gain the COM and DCM positions. According to the dynamics with the estimated positions, their reference positions are updated with discrete time steps dt, and the Jacobian-based inverse kinematics [37] solves the reference joint angles of the stance leg(s). To walk as desired, u ref is required to be tracked with sufficient precision; otherwise, the robot has to change its walking intention (i.e., footstep locations and durations).
The framework we provide later is similar to the one developed by Jeong et al., [38] but the dynamics we assume explicitly contain contact forces on the whole body. Although all the contact forces are concentrated on the supporting leg(s), there would be a delay in the measurement. In addition, as the robotic skin can sense immediately before contact as pseudoforce, potentially high-level responsiveness is expected.

Real-Time Swing-Leg Trajectory Planning
In our approach, not only the COM but also the swing-leg trajectory has to be dynamically planned in task space in accordance with the humanÀrobot walking intentions. We therefore solve a ridge regression for the weighted least square problem of n-th polynomial curve fitting in real time. Here, n-th polynomial curve is defined by the coefficient matrix A ¼ ½a 0 , a 1 , : : : , a n Т and the basis function vector f ðtÞ ¼ ½1, t, t 2 , : : : , t n Т as follows.
where yðtÞ corresponds to the swing-leg trajectory in our case (more specifically, its pose).
To fit this curve to the given conditions, the optimal coefficient matrix A * is derived by solving the following problem.
where Φ ¼ ½ϕ 1 , : : : , ϕ m Т and Y ¼ ½y 1 , : : : , y m Т denote the matrices of m input and output conditions described in the next paragraph, respectively. Σ ¼ diagðσÞ is the diagonal matrix for the weights of respective conditions, and the hyperparameter λ stabilizes the numerical solution. At every time step, A * is updated, and the reference of the swing-leg position at the current time t ¼ T c is given as follows.
Note that, by differentiating the above equation, y : ref ðT c Þ and y ref ðT c Þ can be obtained as well.
As conditions, the initial and terminal conditions can be given as a smooth point-to-point trajectory, i.e., yð0Þ ¼ y i , y : ð0Þ ¼ 0,   smoothly update the trajectory from the previous trajectory. The observations by the forward kinematics, i.e., y obs ðT c À dtÞ, y : obs ðT c À dtÞ, and y obs ðT c À 2dtÞ, that is for numerically deriving the velocity, are also added as the conditions to generate a smooth trajectory. Only in the case of z-axis (step height), the offset Δz is given to the conditions, i.e., zðγT s Þ ¼ Δz and zðð1 À γÞT s Þ ¼ Δz, where γ ∈ ð0, 1=2 is the ratio against the SSP duration. If A * is obtained with the earlier conditions, smooth trajectories can be obtained by the ridge regression. However, yð0Þ ¼ y i and yðT s Þ ¼ y e , which are always required to be passed through, are not satisfied. The equality constraints for them are therefore applied as follows.
where A ½1∶nÀ1,∶ and ϕ ½1∶nÀ1 are the submatrix/vector of the original matrices A and f, to exclude a 0,n . From all the coefficients obtained earlier, the swing-leg trajectory can be generated. The trajectory obtained by this polynomial function is not necessarily monotonically increasing or decreasing, and therefore, the monotonicity of the trajectory is enforced by modifying the obtained trajectory.
To dynamically update the trajectory according to the updated footstep location and duration in real time, we used the swing-leg trajectory using the polynomial curve fitting under n ¼ m (as previously described in the study by Khadiv et al. [23] Such a design is, however, sensitive to noise and likely to generate infeasible trajectories. In contrast, the ridge regression and the design n < m allows the trajectory to ignore noisy conditions while arriving at the optimal footstep location at the desired time T s .

Control Objective with Confidence
Under the earlier dynamics, the robot is desired to be controlled to achieve its control objective, i.e., stepping on the nominal footstep location u n e at the nominal time T n s and starting a new step after the nominal duration T n d . Alternatively, when the given nominal footstep seems to be unachievable, it should be adjusted. To this end, our framework is developed with the ankle and stepping strategies for respective purposes: the ankle strategy is to achieve the nominal footstep and the stepping strategy is to adjust it. The details of the proposed method, which is introduced in the next section, are summarized in Figure 5 with the formula references.
Here, we define a new concept for this problem: the "confidence" of the nominal footstep. In general, the nominal footstep is given as a command or estimated by classification/ regression from sensory data. For example, the footstep commands like walking on stairs are of high importance, whereas those like walking on flat freely are of low importance. In addition, the desired footstep location and duration can be estimated with the high confidence in situations that have been experienced frequently, whereas the estimated results are not always correct especially in inexperienced situations. To explicitly represent these characteristics in the nominal footstep, we assume that each nominal footstep has the confidence c ∈ ½0, 1. That is, the nominal footstep is defined as the following tuple ðu n e , T n s , T n d , cÞ. In addition, to distinguish the nominal footstep from the adjusted one, the adjusted target footstep is defined as the tuple ðu e , T s , T d Þ.

Overview
In this section, the ankle strategy is introduced to hold the robot's intention as much as possible. Specifically, we mainly analyze the optimal VRP (zero moment point, ZMP) u to gain the target footstep under the DCM dynamics in Equation (7)- (12). The DCM dynamics is the first-order linear differential equation, its trajectory can be analytically solved under two assumptions. According to the predicted DCM position linearly related to u, we can inversely solve the optimal VRP u Ã .

DCM Trajectory with Contact Forces
To solve the DCM dynamics and reveal the DCM in the future, two assumptions were made: 1) the natural frequency is hardly changed (i.e., ω 0 ¼ const:) and 2) the VRP u is given as the timedependent and analytically solvable function, such as a sine wave and/or a constant.
Since u is expressed in the reference u ref and the part corresponding to the contact forces u ω , we obtain their respective functions. u ref is simply designed as a constant and a sine wave under SSP and DSP, respectively. www.advancedsciencenews.com www.advintellsyst.com where u e is the center of the next supporting foot, u i is the manipulated VRP, which is identical to the center of the current supporting foot, and Δu ¼ u e À u i (see Figure 4). ω d is the frequency to make u ¼ u e at the DSP duration T d (i.e., ω d ¼ π=ð2T d Þ).
Based on this design, the robot can switch its stance leg via DSP. Next, the VRP for the contact forces, u ω , should be predicted as a time-dependent function. As can be expected, one way is to learn that function using deep-learning-based function approximators to provide high accuracy (e.g., see the study by Kobayashi. [39] ). Instead of such a high-accurate but highcomputational-cost method, to prioritize the computational cost, we assume it to be constant as follows.
u ω ðtÞ ¼ū ω (18) where α ∈ ½0, 1 is the parameter for the low-pass filter. That is, if large contact forces are applied for a long period, the robot expects that the future forces are also large; if the impulsive forces are given momentarily, they are ignored for the trajectory prediction. Note that the simple assumption as constant in Equation (18) has been used even in the conventional work, [6] and a further suggestion is discussed in Section 6. By substituting (17) and (18) for (8) (i.e., replacing u according to Equation (9)), the DCM trajectory is analytically solved as follows.

Conversion from Target Footstep to Target DCM
Here, we convert the target footstep to the target DCM at SSP and DSP. According to the literature, [40] the target DCMs for SSP and DSP are set for two target footsteps in a backward manner. Here, each footstep includes stride from the support leg s (or the VRP as footstep location u), the SSP duration T s , and the DSP duration T d .
First, let us consider that the robot stops walking after the second footstep. That is, the terminal DCM for the second footstep should match the second footstep location, u n ¼ u e þ s n , at the time for the second SSP, T n . The initial DCM for the second footstep (the terminal DCM of DSP), ξ d , should, therefore, be in accordance with the DCM trajectory at SSP in Equation (20) as b e ¼ u n À u e e ω 0 T n Àū ω ¼ s n e ω 0 T n Àū ω ξ d is used as the target DCM at DSP. In practice, the robot will continue walking as long as the footstep commands are renewed by planning at every step.
Next, the initial DCM of DSP (the terminal DCM of SSP), ξ s , is solved from the DCM trajectory at DSP in Equation (20). As Equations (21) is the constraint of the DCM trajectory at DSP, ξ s is given as follows.
ξ s is used as the target DCM at SSP. In both SSP and DSP, only the terminal DCMs of the respective phases should be on the targets, i.e., ξ s and ξ d , at the end of their durations, i.e., T s and T d , respectively. The DCM trajectory in the middle is not required to track Equation (20), although perfect tracking would suppress sudden changes in the references due to drastic replanning. This shift from tracking to regulation problem improves the capability to make the robot adapt to its surroundings during the pHRI.

Ankle Strategy
To force the terminal DCMs on their respective targets, i.e., ξ s for SSP (see Equation (23)) and ξ d for DSP (see Equation (21)), the ankle strategy serves the optimal reference VRP u*. Let us give the current and target conditions for Equation (20) as ξðT c Þ ¼ ξ c and ξðT s,d Þ ¼ ξ s,d . In both SSP and DSP, C and u i (instead of u ref ) are unknown but controllable variables, namely, we have the earlier two conditions to solve these two variables.
Then, u i is analytically solved in both phases as u Ã i .
where Δu Ã ¼ u e À u Ã i . Before applying u* to the DCM dynamics (Equation (7)-(12)), we have to consider its limitation and confidence. First, u* is generated by the stance leg(s) (especially, ankle torques), and therefore, it must be within the convex of support polygons S. Second, if the robot intention is far away from the partner's intention, the partner will change the contact forces to convey his/her intention to the robot. In that time, the prediction of the future forces would be unreliable, and the estimation of u* should be given less confidence.
Hence, u* is limited by the following boundary with regard to S and the confidence c.
where f ðcÞ ∈ ½0, 1 is for nonlinear mapping for generality. Note that, in this article, f(c) is assumed as a constant to simplify the www.advancedsciencenews.com www.advintellsyst.com experimental verification of our framework (see Appendix 8), and therefore, the appropriate design of f(c) is an open problem. One possibility is to define a receding horizon problem like in the study by Wieber et al. [41] According to the confidence, S is virtually shrunk to more easily activate the stepping strategy. If u* obtained by Equation (25) and Equation (26) does not satisfy the boundary above, it is moved to the closest point on the boundary. In addition, at that time, when the target footstep (the nominal footstep) can no longer be obtained, the stepping strategy is required to plan a new target footstep.

Overview
In this section, the stepping strategy is introduced to follow the human intention by replanning the footstep. Specifically, the new target footsteps, i.e., location u e , SSP duration T s , and DSP duration T d , are numerically given by solving two optimization problems (namely, SSP-Opt and DSP-Opt), which consist of the following objectives: 1) to follow the DCM dynamics, 2) hold the nominal footstep, and 3) smoothly update the target footstep. While the first objective is quasiconstraint with a larger weight than others, the others represent the confidence of the robot's intention, i.e., whether the robot wants to hold the high-confident footstep or not.

SSP-Opt for Footstep Location and SSP Duration
Here, we first introduce SSP-Opt to optimize the footstep location, u e , and SSP duration, T s , at SSP. Given u ¼ u Ã Àū ω , the DCM diverges from the current observation ξðT c Þ ¼ ξ c as follows.
where τðtÞ ¼ e ω 0 t is a linearization technique. [23] If u* is clipped to be within its boundary, the DCM no longer reaches ξ s on time T s and would diverge as it is.
To keep walking balance without going against the dynamics in Equation (28), u e and T s are required to be updated. That is, the following cost function should be minimized.
where τ s ¼ τðT s Þ is optimized instead of T s to define a linear optimization problem. [23] Here, ξ s ðu e Þ is already defined in Equation (23). This minimization is actually quasiconstraint for DCM-based walking under the contact forces (asū ω ). The reason why it is "quasiconstraint" is because the ankle strategy has the capability to return the trajectory toward the target footstep by changing u. Hence, the quasiconstraint allows the additional optimization targets to be satisfied. On the one hand, if the robot has high confidence in the current nominal footstep (e.g., to demonstrate the desired footstep to the partner or to walk on limited spaces), the nominal footstep represented by u n e and τ n s is desired to be kept. On the other hand, if the nominal footstep is hardly confident (e.g., when the robot is guided to where to walk), the target footstep, u s e and τ s s , is desired to be smoothly updated according to the current dynamics. In summary, the following two cost functions, J n s and J s s , should be included in the minimization problem.
where w u and w τ are weights for scaling. Here, J n s and J s s would be in tradeoff if u s e and τ s s vary from u n e and τ n s , the difference between them is caused by the dynamics during pHRI (namely, the partner's intention).
Consequently, SSP-Opt optimizes u e and τ s (T s ) to minimize the earlier three cost functions with respective weights, w d s , w n s , and w s s , under the confidence of the nominal footstep c.
u s e , τ s s ¼ arg min subject to where s min,max and T min,max s are boundaries for the respective parameters depending on the geometric and kinodynamic constraints. This is regarded as a quadratic programming (QP) problem, which can be solved with the L-BFGS-B solver [42] in real time (i.e., within the control period of our robot). τ s s is reconverted to T s s after solving, i.e., T s s ¼ ω À1 0 log τ s s . Note that, in the ankle strategy, u s e and T s s are given as the new targets to be sustained, i.e., u e ¼ u s e and T s ¼ T s s .

DSP-Opt for DSP Duration
Here, we introduce DSP-Opt to optimize the DSP duration, T d , at DSP. The contribution for the stabilization capability of DSP is well known; however, its duration has often been heuristically fixed to a sufficient value in many studies [24][25][26] or simply the DSP is ignored in mainly quasipassive walker approaches. [22,23] T d actually causes a tradeoff between stability and walking speed. This Review, therefore, optimizes T d , which is usually relatively short but enables to recover from unexpected disturbances, in addition to SSP. From the current observation ξðT c Þ ¼ ξ c , the DCM reaches the following state at T d .
where all the parameters related to T d are explicitly described. The nonlinearities in them are unfortunately unavoidable, although we have their analytically differentiable definitions. Similar to the case of SSP, the DCM dynamics with the clipped u Ã cannot lead to ξ d on time T d . Instead, a long T d would allow www.advancedsciencenews.com www.advintellsyst.com achieving ξ d by updating u* with the ankle strategy repeatedly, or ξ may pass ξ d before T d .
To start the ideal next footstep, DSP-Opt optimizes T d . Similar to SSP-Opt, three cost functions are designed for 1) convergence on ξ d to satisfy the DCM dynamics, 2) keeping T d nominal to hold the robot intention, and 3) smoothing its change to smoothly follow the partner's intention.
where T n d and T s d denote the nominal and target values. w þ and w À denote asymmetric weights. Usually, this asymmetric design in J n,s d has to consider the risk of shortening T d , i.e., w À > w þ . In summary, DSP-Opt optimizes T d in pursuit of the minimization of the above cost functions with the respective weights, w d d , w n d , and w s d , under the confidence of the next footstep c.
subject to where γ min,max are upper and lower boundaries depending on the nominal time. This optimization problem is nonlinear but with www.advancedsciencenews.com www.advintellsyst.com differentiable cost functions. Therefore, we solve this problem with the L-BFGS-B solver in real time to gain at least a locally optimal solution. After solving, T s d becomes the new target for the ankle strategy at DSP, T d ¼ T s d . As a consequence, the stepping strategy updates the target footstep (u s e , T s s , and T s d ) by solving the two optimization problems, named, SSP-Opt (31) and DSP-Opt (37), if the ankle strategy cannot find a u* satisfying the boundary as defined in Equation (27). Note again that the above box-constrained optimization problems can be solved within the control period due to its simplicity with only a few optimization variables. In addition, by providing the real-time update capability under the box constraints, incorrect updates can be modified in the next updates, and undesirable updates can be avoided by the box constraints.
In this strategy of switching, the confidence of the nominal footstep c absolutely plays an important role, i.e., whether the robot holds/follows the robot's/partner's intention (see Figure 6). With high confidence, the ankle strategy tends to resist the contact forces from the partner, and even if the stepping strategy is activated, it aims to keep the footstep nominal as much as possible (i.e., preserving the walking balance). With low confidence, the ankle strategy easily relies on the stepping strategy, which aims to follow the partner's intention.

Experimental Results
In this section, three types of experiments with a real full-sized humanoid robot covered with artificial sensing skin (see  Figure 8. Tracking accuracies for the reactive swing-leg trajectories: "R" and "L" in legends denote the right and left legs, and "obs" and "des" in legends mean the observed and desired values. The observed noises were caused by the state estimator and the gain setting for motors. A remarkable feature was found in the z-axis trajectories because the footstep duration was modified or the landing was earlier than expected due to the unexpected tilting of the base link. The large applied forces smoothly updated the desired trajectories toward the next footstep locations, and the inverse kinematics-based controller accurately tracked them. www.advancedsciencenews.com www.advintellsyst.com Figure 1) are reported. The configurations of the proposed method are summarized in Appendix A. Note that all the nominal footsteps in the experiments are given by an operator for simplicity. Adaptive updates of the nominal footsteps are discussed in the next section.

Pushing from Four Directions
In this experiment, the robot steps in place with the confidence c ¼ 0. The robot is pushed twice with small and large impulses in the same direction of each trial. We show the results of pushing in four directions, i.e., forward, backward, leftward, and rightward (see Figure 7 and the attached video). The specific values of the two impulses are given in the respective captions. After the first push with a small impulse, the ankle strategy succeeded in resisting that force by adjusting the ZMP. The second push was relatively larger than the limits of the ankle strategy, and as a result, the stepping strategy was activated to change the footstep location. Note that the difference in the magnitude of the allowable impulses between the directions, e.g., the first push from leftward (33.1 Ns) was larger than the second push from forward (13.9 Ns), is due to the influence of multiple factors, such as the size of the support polygons, the timing of the push, and the moving direction of the COM. In summary, although thresholds for the impulses cannot achieve proper activation of the stepping strategy due to the influence of such multiple factors, the proposed method keeps the balance of stepping motion against the contact forces from any direction.
From these experiments, we also confirm how smoothly the swing-leg trajectories were updated according to the large applied forces, as shown in Figure 8. The IK-based controller implemented on the robot was able to track the desired swing-leg trajectories updated in real time with high accuracy. However, the robot's state estimator was disturbed by the applied forces and/or impacts on landing and the observed values were sometimes with noise.

RobotÀRobot Interaction with Leader and Follower
In this experiment, the humanoid robot, H1, interacts with a mobile robot as a partner, i.e., TIAGo [43] with an impedance controller. The control behavior of TIAGo is described in Appendix 9. A leader moves forward and pushes a follower, which moves according to the contact forces up to around 80 N (see Figure 9 and the attached video).
When the humanoid robot is the leader, its nominal stride is given as 0.1 m and its confidence c ¼ 1. The results at that time are shown in Figure 10a. From the upper and middle part of Figure 10a, we found that the humanoid walked forward while pushing the partner mobile robot backward. Based on the ankle strategy with high confidence, the robot hardly changed its footstep except for a small change in the SSP duration, while maintaining walking balance. This was because the contact forces backward prevented the humanoid robot from walking forward, and the DCM (and the COM) needed a longer SSP duration to reach the target.
When the humanoid robot is the follower, it steps in place with confidence c ¼ 0. The results in that time is shown in Figure 10b. The humanoid robot received the same amount of force (up to around 80 N) from the mobile robot (see upper part of Figure 10a,b), and as a result, it stepped back once. However, the humanoid robot succeeded in returning to the stepping-in-place task. This is because the mobile robot pushed the humanoid robot during the DSP, which has large support polygons, and the COM offset was obtained by the ankle strategy to resist the contact forces. The timing of the push was conveniently adjusted by DSP-Opt, which made the DSP duration almost double after going back to the stepping-in-place task.
From these results, if the magnitude of the contact forces is within the allowable range (under 80 N and/or 40 Ns estimated from the earlier experiments), we can expect that when the physical interactions are given at the appropriate timing or the size of support polygons in Equation (27) is appropriately shrunk virtually according to the confidence, the humanoid can switch its role between leader and follower, using the proposed method. However, it should be noted that even if the humanoid robot is the follower, its top priority is to maintain balance, namely, it is not desired to be completely obedient to the leader when the mobile robot is the follower.

"Box
Step" with pHRI In this experiment, the humanoid robot tries the box step (see Figure 11) with small stride at 0.05 m. To allow the updates, the confidence c is given as 0.5. That is, the partner who is in contact with the humanoid robot, as shown in Figure 12, tries to increase the stride by pushing the robot (specifically, its right arm and left shoulder) at the right time. The results are shown in Figure 12 (snapshots extracted from the attached video), Figure 13 and 14.
During the first round (from the beginning to around 10 s), the partner hardly applied forces to assess the nominal footsteps of the robot (see Figure 14). At that time, as shown in 6-axis FT sensor Attachment for contact Figure 9. Experimental setup for the robotÀrobot interaction: the robot's touch with torso and end-effector, respectively. One moves forward as a leader and the other moves as a follower according to the contact forces.
www.advancedsciencenews.com www.advintellsyst.com Figure 13a,b, the robot motion was stable but with small stride. After that, the partner applied larger forces to increase the stride.
In several attempts (in particular, during lateral stepping), the interaction failed due to wrong timing and insufficient impulse, which allow the robot to recover balance using only the ankle strategy. However, as shown in Figure 13c,d, i.e., from 19 s and 42 s, the partner applied the forces along the COM trajectory, and as a result, the stride increased compared with the cases without the forces, as shown in Figure 13a,b, respectively. The earlier experimental results show that our method made it possible for the robot to interact with the partner through natural and multiple-contact pHRIs in real time. However, we have to remark that the robot reacted the partner's intentions only in 12 of 16 steps, and the partners were forced to make a great effort to increase the stride at every step.

Discussion
In this section, we discuss the limitations and improvements of the proposed method based on the experimental results and the implementation details.

Update of Nominal Footstep
In all the experiments, the nominal footsteps were given as commands by an operator. That means, even if the humanoid robot changes its footstep location and/or duration before, the robot never updates and learns the optimal nominal footsteps using the proposed method. As in robotÀrobot interactions, the humanoid robot resisted the contact forces applied by When the humanoid, H1, is the follower and the mobile robot, TIAGo, is the leader Figure 10. Results of robotÀrobot interaction with leader and follower. Note that the difference in the total movement distances is due to sensing error and slippage. In (a), even when the humanoid pushed the mobile robot, it succeeded in walking with nominal stride, thanks to the ankle strategy and the adjustment of SSP duration. In (b), the mobile robot pushed the humanoid again and again, but the COM offset onto the front by the ankle strategy enabled the humanoid to resist the contact forces without changing the footstep location; after going backward once, DSP duration was nearly doubled to gain balance and to make the COM offset converge.  Figure 11. Diagram of box step: swing-leg movements and landing locations are shown with arrows and dotted line footsteps. It is a basic dance step with six footsteps for one round.
13th step 14th step 28th step 29th step Figure 12. Snapshots of typical cases: The partner pushed the robot from the seventh step (i.e., after one round) to increase the stride. According to that, the robot increased the strides when they were applied at the right time. the mobile robot after one-step modification. Even in the box-step case during the multicontact pHRIs, the partner had to convey the next footstep by pushing the humanoid robot at every step. The humanoid robot with the proposed method, therefore, cannot predict the partner's intention as a human does, and the partner needs to make effort to provide his/her intention. One solution for this problem would be to apply learningbased methods. Using time-series data from the contact forces, the robot could learn to predict intended direction, the stride, and duration of the current footstep(s). However, only supervised signals at the time to change the phase between SSP and DSP can be collected; hence, semisupervised learning like in the study by Kobayashi et al. [44] would be more suitable. With such a method, the robot will be able to predict the human's walking from the history of the contact forces.

Limitations in Dealing with Various Contact Forces
Our framework utilizes the composite force on the base link, i.e., a composite of the contact forces on the whole body skin; it can handle forces of various magnitudes and directions in the same way for keeping walking balance. However, to ensure smooth handling of undesirable overreactions, several smoothers were installed in our framework like J s s,d for careful updates of the robot motion, and the robot response would not change the footstep too quickly. Thus, our framework can safely maintain gait balance for modest force magnitudes and changes, but it would not keep the robot in balance with arbitrarily forces.
As another perspective, the question on how to predict the future contact forces (e.g., see Equation (18) and (19)) remains an open problem to achieve high agility to various contact forces and natural physical interactions with walking balance. Dangerous pHRIs can be expected when this prediction has low accuracy. In particular, when the points of contact can be changed freely as in our system, which requires to predict where it will be touched in the future, it is a big challenge to improve the accuracy of this prediction. To improve the prediction accuracy, we have to investigate the best function to represent the trajectory of the contact forces while satisfying the  Figure 13. Typical footsteps: a,b) the cases without the contact forces; c,d) the robot was pushed by the partner (also see Figure 12). Comparing (a) and (c), the strides to backward and leftward were certainly increased. Similarly, in (d), the robot gained larger steps than in (b).
www.advancedsciencenews.com www.advintellsyst.com condition that an analytical solution can be derived in real time.
The improved prediction accuracy and the controller based on it would be able to compensate for the inherent delays in the robot and our framework.

Integration with Other Strategies
In the proposed method, only the COM is reactive for the contact forces in pursuit of control in real time. This implementation made multicontact pHRIs difficult in certain aspects. For instance, the positions and orientations (i.e., the transformation matrices to the COM) of the skin cells attached to the upper body were fixed, and that restricts the direction of the contact force onto each cell. Therefore, in the third experiment, the partner frequently changed the touched skin cells (see the attached video). If the joints have compliance to relieve the contact forces, a part of f i,cell will be used to change the robot's pose, and the remaining will be propagated to the COM according to the degree of the compliance and the joints' performance. [16] While such compliance is important to make the robot robust to contact forces, it generally leads to behaviors that avoid contacts and hinder continuous interactions. Even though, our walking controller can be integrated with a compliant system using the concept of intentional contacts, [45] that is, if the partner wants to keep in touch with specific parts of the robot, the robot will allow contacts by reducing the compliance of the corresponding joints after a short time. Using this concept, the partner will be able to walk together with the robot while changing its posture in such a way that it is suitable to push toward the desired direction.

Conclusion
In this article, we presented bipedal walking control during multicontact pHRIs dealing with the tradeoffs between robustness to achieve the robot-intended motion and adaptability to follow the human-intended motion. Two main strategies were proposed: the ankle and stepping strategies were integrated for this purpose. Specifically, the ankle strategy derives the analytically optimal VRP to hold the target footstep representing the robot's intention. However, when the optimal VRP is out of the robot's support polygons, the stepping strategy is activated to update the target footstep by solving two optimization problems. Whether the robot is the leader or a follower is simply determined according to the confidence of the nominal footstep, which was introduced in these strategies quantitatively. Three types of real experiments verified that our approach could 1) keep walking balance from various magnitudes and directions of the contact forces, 2) switch the robot's role between the leader and follower according to confidence, and 3) achieve "box-step" dance between a human and a full-sized humanoid robot during multicontact pHRIs while adjusting its own stride.
As discussed in the earlier section, our approach presents several research challenges that should be explored in future work. In particular, to fully exploit the benefits from robotic skin and for even more complex multicontact pHRIs, the integration with joints compliance will be explored. We will also examine how best to allow the robot to update its own intention (i.e., the nominal footstep) after a few physical interactions based on multiple contact forces. The robot with these improvements will be better equipped to be applied to real-world problems in the future, such as the physical assistance of the elderly.

Appendix A Controller Configurations
The parameters for the proposed method in this article are shown in Table 1. First, these parameters, except the ones related to the robotic skin, were tuned in a dynamic simulator, Gazebo. [46] Afterward, they were fine tuned on the real robot. The parameters with regard to the robotic skin were calibrated by pushing the skin cells mounted on the end-effector, which also has a six-axis force and torque sensor. Nominal footstep location (or stride) and the confidence of the nominal footstep are task dependent, and therefore, they were set in the respective experiments.
As a remark, to specify the definition of f(c) in Equation (28) with the confidence c, we simply assume that f(c) is given as a constant along the respective axes: f ðcÞ ¼ β x,y . That is, the support polygon is no longer shrunk according to c. This is because the behaviors of the robot were very sensitive to the size of the (a) (b) (c) (d) Figure 14. Time-series data of the COM and the contact forces: during the first 10 s (i.e., one round), the partner checked the robot motion without applying the contact forces. the amplitude of the COM was increased like synchronizing with the contact forces. When the timing of pushing was incorrect and/or the impulse was insufficient, the robot kept the footsteps nominal.
www.advancedsciencenews.com www.advintellsyst.com support polygon, and to analyze stable experimental results, the size was desired to be fixed.

Appendix B Controller for TIAGo
TIAGo, which is developed by PAL Robotics, [43] is used for pushing the humanoid robot or being pushed by the humanoid robot.
To make it possible to physically interact with the humanoid robot, we implemented a simple impedance controller as follows.
where the position of TIAGo is given as x. M, D, and K denote the virtual impedance. f ex is the external force measured by a six-axis force and torque sensor equipped at the end-effector of TIAGo. f in applies the built-in force, which allows the robot to actively move as the leader. According to this virtual dynamics, the reference position and velocity are then updated. Here, to imitate the walking behavior of a human partner, the rest position x r is updated when TIAGo moves over the stride S.
where the initial value of x r is equal to 0. These parameters are shown in Table 2. Note that the built-in force is applied only when TIAGo is the leader.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.