Model predictive control of a collaborative manipulator considering dynamic obstacles

Collaborative robots have to adapt its motion plan to a dynamic environment and variation of task constraints. Currently, they detect collisions and interrupt or postpone their motion plan to prevent harm to humans or objects. The more advanced strategy proposed in this article uses online trajectory optimization to anticipate potential collisions, task variations, and to adapt the motion plan accordingly. The online trajectory planner pursues a model predictive control approach to account for dynamic motion objectives and constraints during task execution. The prediction model relates reference joint velocities to actual joint positions as an approximation of built‐in robot tracking controllers. The optimal control problem is solved with direct collocation based on a hypergraph structure, which represents the nonlinear program and allows to efficiently adapt to structural changes in the optimization problem caused by moving obstacles. To demonstrate the effectiveness of the approach, the robot imitates pick‐and‐place tasks while avoiding self‐collisions, semistatic, and dynamic obstacles, including a person. The analysis of the approach concerns computation time, constraint violations, and smoothness. It shows that after model identification, order reduction, and validation on the real robot, parallel integrators with compensation for input delays exhibit the best compromise between accuracy and computational complexity. The model predictive controller can successfully approach a moving target configuration without prior knowledge of the reference motion. The results show that pure hard constraints are not sufficient and lead to nonsmooth controls. In combination with soft constraints, which evaluate the proximity of obstacles, smooth and safe trajectories are planned.

production lines sharing the environment with humans as their lightweight structure and collision awareness mediate the risk to cause harm to people. Their range of application is limited by the relatively low payload of typically a few kilograms only. These small-sized and low powered robots perform simple repetitive tasks, such as part handling in human-robot collaboration, to combine the repeatability of robots with the dexterity of humans. Human-Robot collaboration is a viable alternative for semiautomation in production lines for which full-scale process automation is economically nonfeasible for small and medium lot sizes. However, until now, collaborative robots are preferably used in a cooperative sense to work on their tasks in parallel to humans, only exploiting the benefit of quick task reconfigurations. An actual collaboration where the human and robot truly interact as a part of the process is comparatively rarely seen. This article utilizes online trajectory optimization based on model predictive control as a step towards a truly collaborative robotic framework.

Online trajectory planning
Trajectory planning provides the link between an abstract task planning layer and the execution of robot reference motions governed by underlying tracking controllers. Depending on the field of application, the requirements for trajectory planning vary. In the following, trajectory planning for collaborative robots is categorized into three classes. The first class comprises approaches that perform offline planning in static environments with fixed target configurations and thus only serve for cooperation tasks. 1,2 Following a fixed trajectory is not practical for collaboration. Imagine, for example, a task where a human coworker assembles multiple pieces into one and then hands it over to a collaborative robot that performs verifications and final packaging. Not only moving obstacles inside of the robot's workspace and the potential risk of collisions require special measures for trajectory planning, but also the uncertainty of task components, as in this example, the hand-over configuration. A step towards collision avoidance for approaches of this class can be achieved by preprocessing joint velocity references inside of the tracking controller. 3 The algorithm searches for a joint velocity that most closely matches the desired one, in the sense of the quadratic Euclidean norm, subject to constraints on the relative speed between approaching bodies. However, since this only modifies the execution of motions, but not the actual planning, the robot reacts to the environment in a way that is not covered by the plan. Furthermore, the approach can reduce joint velocities thus far that the robot entirely stops. This makes it similar to built-in collision detection methods and leads to task resets and process delays. In general, workspace surveillance and online trajectory planning become mandatory for collaborative robots that closely interact with humans in the shared environment.
Approaches of the second class use a preplanned path and modify the time parametrization of the corresponding trajectory online. [4][5][6][7][8] By scaling the time parametrization, the robot responds to a dynamic environment by either slowing down or speeding up its motion, enabling a time-shared collaboration. Additional constraints consider velocity, acceleration, and jerk limitations of feasible motions. However, since the path is fixed, these approaches cannot react to changing waypoints, and the robot might freeze and delays the process. Beckert et al 9 use planned failsafe and recovery maneuvers in these situations to allow the robot to leave its path temporarily.
The third class consists of approaches that modify both the path as well as the time parametrization of a trajectory providing the most flexibility and full collaboration. Online planning might result from the advancement of established offline approaches to achieve real-time capabilities. Wang et al 10 present a grid-based path planning approach that only considers a subset of the search space to reduce computation time. Kohrt et al 11 propose a similar method that restricts the search space to Voronoi regions. The trajectory then emerges from a time parametrization along the path, for example, based on trapezoidal velocity profiles or piecewise polynomial interpolation with imposed velocities and accelerations. 12 Another approach by Yoshida et al 13 performs planning and execution asynchronously. Upon changes in the environment, a subroutine replans a trajectory based on the current initialization and passes it to the execution process. However, this approach places demands on the dynamics of the environment that are not always given.
In contrast to conventional approaches that separate path and trajectory planning, trajectory optimization plans the trajectory as a whole. Trajectory optimization can be seen as shaping an initial path and time parametrization by iteratively improving a solution set with respect to given objectives and constraints. Formulating trajectory planning as an optimal control problem opens an intuitive way to incorporate task objectives and limitations into the planning algorithm by quantitative constraints and cost functions. To cope with the burden of computational complexity, online trajectory planning often considers a limited time horizon that covers the immediate future. This short optimized trajectory is then sent to an execution process preempting the previous one. If the cycle time for planning is less or equal to the discretization of the planned trajectory, only the first element of the trajectory is effective, making it similar to the Model Predictive Control approach. Model predictive control repeatedly solves an optimal control problem on a finite horizon and applies the first element of the optimal control sequence to the process. 14 Sending only the first element of the planned trajectory combines planning and execution. Model predictive control can explicitly consider the dynamics of low-level joint motion controllers in a prediction model. The closed-loop nature of model predictive control additionally accounts for model uncertainties and disturbances already during planning and not only during execution. Ide et al, 15 Zube, 16 Buizza Avanzini et al 17 propose a model predictive controller to control mobile manipulators, also considering the movement of the robot's base. Because of the predictive abilities, model predictive control for robotic manipulators often establishes a look ahead tracking controller, for which future references are known in advance. [18][19][20][21] Besides optimal control, reinforcement learning also utilizes the dynamic programming theory and is therefore closely related. From a motion planning perspective, reinforcement learning generates a policy for motion commands, which maximize a reward function for collision-free operation, depending on the state of the robot and the environment. For example, Sangiovanni et al 22 perform planning via reinforcement learning by including the robot's and obstacle's states into the search space and learn a policy, which commands joint speeds. Reinforcement learning offers the advantage that no system equations have to be solved online, as the model is mainly data-based. In return, the dimensions of the search space increase through additional environmental states, which can prolong the learning phase. Offline learning in simulators is a good alternative to speed up learning, but this usually requires a model. Optimal control benefits from a precise model, that can be created and simulated with comparatively less effort and avoids preliminary learning steps. This applies, for example, to cascaded structures such as those of a robotic manipulator, in which lower-level joint control loops decouple and linearize the system dynamics.

Related work
The literature reports several approaches that similarly address the problem. Wang et al, 23 Diehl et al, 24 Zhao et al, 25 Zube 16 introduce a model predictive controller for online trajectory optimization on manipulators as well, but their analysis is merely based on simulations ignoring model uncertainty and influences of real applications. Wang et al, 26  A dynamic model predicts the future arm motion for a designated sequence of control inputs. In the context of model predictive control, there is a tradeoff between model accuracy and computational effort. Model predictive control requires not only to predict but also to optimize the future trajectory within the time period of a control cycle. In many applications of model predictive control, the prediction model comprises the entire nonlinear robot forward dynamics to relate applied joint torques to the evolution of joint position, velocity and acceleration, 20,24,25 leading to a nonlinear multiple-input-multiple-output system. Even though these models are attractive from a theoretical perspective as they capture the arm dynamics, they face some problems in practice. First of all, the computational demand for solving the nonlinear differential equations in real-time prohibits the combination with complex distance and forward kinematic computations. Also, collaborative robots usually only provide an interface to track position or velocity reference trajectories with limited or no access to the underlying low-level torque controllers to not undermine safeguards.
Partitioning the overall control task into smaller subtasks for current, torque, velocity, and position control reduces the complexity of control design. In the cascaded control architecture, the inner control loops can be modeled by linearized and decoupled dynamics. 12 The concept of cascaded control loops is thus rather common in robot motion control and is also applicable to prediction models. 27 Wilson et al 28 propose a Proportional-Integral-Differential controller for the inner control loop and a model predictive controller for the outer cascade to control the motion of a two-link robot arm. The motion model in References 18,19 and 29 is constituted by a series of integrators obtained from feedback linearization, which further simplifies the simulation of the prediction model. However, these approaches also assume low-level interfaces to regulate the joint actuators on either the current or torque level, which are encapsulated from the user on collaborative robots. This work rests upon an identified and validated linear decoupled prediction model that assumes a plausible structure of the underlying inner joint motion control loop of the robot. Zube 16 propose an integrator type model for motion prediction but do not report its accuracy in prediction or simulation.
Anticipation and avoidance of obstacles within the online motion planning are crucial prerequisites for interruption-free and safe human-robot collaborations. Obstacles that move during the robot motion are considered as fully dynamic, whereas semistatic obstacles merely change their pose between task executions. Obstacle avoidance mainly bases either on distance values between two objects calculated, for example, by an Euclidean distance function as by Lumelsky, 30 Gilbert et al, 31 or a binary collision check. The binary collision check is sufficient for discrete path planning algorithms like Rapidly-Exploring-Random-Trees, 32 Probabilistic Roadmap Planner, 33 or Expansive Search Trees, 34 to determine a collision-free path from start to goal in a static environment. On the other hand, distance functions provide more quantitative information and are thus applicable in a broader context but are computationally more demanding. Cascio et al 35 incorporate an implicit distance function that is formulated as a separate optimization problem inside of the overall optimization problem for trajectory planning. In trajectory planning via an optimal control problem, obstacle avoidance can be achieved by an objective function that maximizes the separation between robot and obstacle bodies. 36 The literature proposes different implementations of objective functions, for example, functions that grow hyperbolically as the separation distance tends towards a minimum distance threshold. 16,[36][37][38] However, the drawback of pure hyperbolic functions is twofold. First, a nonvanishing obstacle cost causes an unnecessary conflict between goal accuracy and separation distance costs, even if the robot has sufficient clearance from the obstacle. Second, a hyperbolic objective function mimics a barrier function that is used to realize hard constraints (HC). While this is useful for a solver that cannot handle HC natively, it misses the purpose of more powerful solvers like interior-point frameworks. To safeguard obstacle avoidance, additional inequality constraints maintain a guaranteed minimal separation. Wang et al 23,26 impose inequality constraints with respect to a distance metric to achieve obstacle avoidance. However, Wang et al 23 focuses on simulation only, whereas Wang et al 26 restricts the approach to a planar workspace. Buizza Avanzini et al, 17 Zube 16 achieve obstacle avoidance through explicit constraints and dedicated terms in the cost function. The approaches assume statics obstacles and do not provide for self-collision avoidance. Furthermore, the implementation of Zube 16 only considers a few isolated points on the robot for obstacle avoidance, rather than the entire body.
The geometric representation of robot and obstacle bodies should be flexible to cover a wide variety of object shapes; on the other hand, complex geometries put an unnecessary burden on computations of pairwise distances between bodies. Landry et al 39 operate with polyhedrons and utilize Farkas lemma in conjunction with back-face culling to reduce computational complexity. However, the algorithm does not run in real-time due to the high complexity of the additional constraints and optimization parameters introduced by Farkas lemma. Several approaches aim to reduce the complexity, either by safety planes, 40 or projection of the workspace onto a plane thus rendering distance computation as a 2D problem. 26 The ordinary Euclidean distance between a pair of points is the least complex metric. To model spherical objects this distance metric is merely augmented with a safety margin. 41 A spheres-only approach comes with the drawback that longitudinally shaped objects either have to be covered by a large number of overlapping spheres or that a single sphere enclosing the entire object contains a large volume of free space. Line swept spheres introduced by Larsen et al 42 nowadays constitute state of the art for obstacle representation. 3,16,35 A swept sphere, also denoted as a capsule, indicates the volume that a sphere covers during its translation along a straight line. Its shape is similar to a cylinder, but the distance metric is more straightforward to compute for the two spherical end sections.
Solving the optimal control problem in real-time constitutes a significant challenge in online trajectory optimization. The complexity of numerical optimization also depends on the representation of the optimal control problem. The CasADi framework by Andersson et al 43 can be used to save computation time as it replaces numerical differentiation by automatic differentiation. The extra computations to set up and prepare the optimization problem in symbolic form are usually negligible for static problems, but become noticeable for dynamic problems. 44

Contribution
This contribution introduces a novel model predictive control scheme based on a hypergraph for online trajectory optimization of collaborative robots. The approach considers motion constraints such as joint state limits in position and velocity, as well as full robot self-collision avoidance and avoidance of moving obstacles in the entire workspace. The proposed cascaded control architecture offers the advantage that the model predictive controller with joint velocity control actions explicitly incorporates constraints on joint velocities in motion planning. The common alternative of planning and commanding also joint configurations comes with the drawback that the joint velocities are changed by the tracking controllers to minimize the joint configuration error and thus are neither optimized nor constrained. The approach achieves obstacle avoidance with minimum separation by inequality constraints, and clearance from obstacles by a cost term in the cost function of the optimal control problem. The cost term grows quadratically for a separation distance below an activation threshold and is zero otherwise. High priority tasks for safety-related features might delay the execution of motion commands in the low-level software. 45 Input delays affect safe operation and possibly cause collisions as they limit the actual rate and timeliness of control interventions. This contribution explicitly analyzes model accuracies and time delays by utilizing model F I G U R E 1 The model predictive controller regulates the joint configuration q(t) to the target configuration q ref (t). The reference velocity (t) is the control input to the inner velocity tracking controllers that in turn generate joint actuation torques (t) identification and validation on a real robot. The deduced prediction model within the model predictive controller accounts for input delays caused by dead times and computation times following the approach of Grüne and Pannek. 14 The effectiveness of the model predictive control scheme is validated and analyzed in robotic pick-and-place experiments on Universal Robots 46 collaborative robot UR10 regarding computation time, constraint violations, and trajectory smoothness. The experiments consider variations of task parameters, in particular, a dynamic goal pose as well as different obstacle scenarios including a person. In a previous publication, the authors propose a hypergraph structure to represent the optimal control problem. 44 The computation times equal CasADi for small to mid-sized optimization problems but exhibit a much shorter period of preparation. The introduced approach in this article benefits from efficient preparation as the structure of the optimization problem varies over time, as moving obstacles integrate dynamics into the number of obstacle avoidance constraints.

Outline
The next section analyzes the overall control structure and establishes an appropriate robot model for prediction within the model predictive control framework. Section 3 introduces the model predictive controller for online trajectory optimization. It formulates the motion planning task as an underlying optimal control problem with a designated obstacle representation for robot self-collision and obstacle avoidance. Section 3 also provides an insight into the hypergraph structure of the optimal control problem as well as the closed-loop control algorithm with modifications to cope with input delays. Section 4 is concerned with the practical experiments on a Universal Robot UR10 collaborative robot to evaluate the new concept under real-time conditions. This includes the identification and validation of the robot motion model as well as evaluations with a moving target configuration and dynamic obstacles. It analyzes obstacle avoidance strategies concerning computation time, smoothness, and separation distance. Section 5 concludes the pros and cons of the proposed approach and provides an outlook on future research. Figure 1 depicts the cascaded control loop architecture, in which the outer loop contains the model predictive controller, and the inner loop is composed of tracking controllers for velocity references. The model predictive controller plans the sequence of control actions for an optimal point-to-point motion from current joint configuration q(t) to the target configuration q ref (t) ∈ R N for a robot with N degrees of freedom. Control actions (t) ∈ R N constitute the input to the velocity tracking controllers of the inner cascade. The observed joint states q(t) ∈ R N ,q(t) andq(t) constitute the available feedback for the outer loop model predictive controller. The underlying tracking controllers generate the torques (t) ∈ R N commanded to the robot actuators. The multiple-input-multiple-output robot joint dynamics with interjoint disturbances are decoupled by high gear reductions and centralized tracking controllers of the inner cascade. 12 Usually, these controllers are also able to compensate disturbances caused by collisions or sudden load changes, but in that case the safety software would interrupt further execution. This approach assumes sufficient disturbance rejection on joint level by the inner tracking controllers, which are modeled by N decoupled, linear systems

SYSTEM OVERVIEW
Please note the introduction of the reference velocity u i ∈ R as the open-loop variant of i to highlight the difference between the open-loop controls during optimization and the final commanded closed-loop control. Usually, a second-order system to model the velocity tracking controller in conjunction with an integrator are sufficient (M = 3), as the former can successfully model rise time as well as possible overshoot, and the latter transfers velocities to joint angles. The poles p i,1 ∈ C and p i,2 ∈ C of the second-order system are identified in conjunction with dead times T D,i in Section 4.2.
In the subsequent analysis, the third-order model is compared with less complex time-delayed state space models of orders M = 1 and M = 2. In (1), the first state component x 1,i (t) denotes the joint configuration. For a second-order state space model (M = 2), the second state component x 2,i (t) denotes the actual joint velocity. For a third-order state space model (M = 3), the third state component x 3,i (t) denotes the joint acceleration. While today's robots usually have accurate and low-noise encoders to measure joint angles, it may be necessary to determine other states such as velocity or acceleration using an observer. The state space model f ∶ R NM × R N × R N → R NM for all N subsystems is then given by: Matrix C ∈ N NM×NM rearranges x(t) so that it contains joint angles, joint velocities, and joint accelerations in that order.

MODEL PREDICTIVE CONTROL
This section formulates the planning task as a continuous-time optimal control problem including motion objectives and constraints with a particular focus on obstacle avoidance and obstacle representations. The continuous-time optimal control problem is transformed into an approximative nonlinear program through full discretization. The hypergraph formulation exploits the inherently sparse problem structure and facilitates efficient computations of derivatives at runtime. The section concludes with a description of the closed-loop control algorithm and compensation for input delays.

Continuous-time optimal control problem
The main objective is the regulation of robot joint motion towards the static target joint state Due to the moving horizon in model predictive control, the extension to track a target joint state trajectory is straight-forward. For the sake of readability, denotes the open-loop time index in prediction and optimization whereas t ∈ [0, ∞) indicates the closed-loop execution time with t n for n ∈ N representing time indices at which planning is triggered. To ensure a smooth and safe robot operation, the planning stage already considers safety and dynamic motion constraints, in particular separation from obstacles, and limits of joint angles and joint velocities.
The general planning task is formulated as a continuous-time optimal control problem: subject to The problem formulation includes a terminal state cost V f ∶ R NM → R: The term ∶ R NM × R N → R denotes the common quadratic running cost: which penalizes the squared state error, control effort, and control smoothness. The weight matrices Q ∈ R NM×NM , R ∈ R N×N , and R d ∈ R N×N are positive definite and determine the compromise between regulation error, control effort, and smoothness. The additional running cost terms O ∶ R NM → R and L ∶ R NM → R as well as the inequality constraints (3d) for obstacle avoidance are discussed in the next section. Equation (3b) ensures that the current joint state at time t n represents the initial state of the predicted trajectory at 0 . Equation (3c) denotes the dynamic model (2). The term in inequality constraint (3e) captures the boundaries on state and control signals:

Obstacle avoidance
Objects are represented by line swept spheres introduced by Larsen et al. 42 Line swept spheres base on geometric primitives which facilitate distance calculations in three-dimensional space. Geometric primitives are for example points, line segments, or faces. Sweeping a radius r around a point or line segment forms the sphere or line swept sphere depicted in Figure 2. Line swept spheres can enclose elongated objects efficiently as opposed to spheres. Let  denote the set of bounding volumes b ∈ , which are represented as tuples: A line swept sphere is composed of a line segment defined by the two dynamic endpoints p 1 (t) and p 2 (t) and the sweeping radius r. An ordinary sphere constitutes a special case of a line swept sphere with p 1 (t) = p 2 (t). Multiple cylindrical or spherical volumes enclose more complex objects following Martinez-Salvador et al. 41 The separation d(b 1 , b 2 ) ∶  2 → R between two objects b 1 ∈  and b 2 ∈  as demonstrated in Figure 2 is given by: in whichd(b 1 , b 2 ) ∶  2 → R denotes the Euclidean distance between line segments of b 1 and b 2 computed with the algorithm by Lumelsky. 30 The set of bounding volumes  is partitioned into robot links  ⊆  and relevant obstacles  ⊆ . An obstacle o ∈  becomes relevant for motion planning once it enters the robots safety volume s ∈ , which is defined as a sphere that encloses the robot workspace plus an additional safety margin: Consider the set that contains link body pairs (l 1 , l 2 ) ∈  ×  that potentially cause a self-collision: Obviously, consecutive link bodies are self-collision safe. Obstacle poses are defined in world coordinates of the task space and for robot bodies l ∈ , they are obtained from joint states x by forward kinematics. Their dependency on x is omitted for better readability. The ) .
The obstacle avoidance constraint is defined by: in which, j,O ∈ R + for j = 1, 2, … , | × | denotes the separation distance threshold. The self-collision constraint is defined analogously: with threshold j,L ∈ R + for j = 1, 2, … , ||. Safe operation not only requires mere obstacle avoidance but should also evaluate the general clearance between objects. The separation cost ∶  ×  × R × R → [0, ] grows quadratically for a separation distance below an activation threshold ∈ R + , and is zero otherwise: Compared to the original one by Mohri et al, 47 this function can easily be scaled by ∈ R + to adjust the impact of each obstacle on the overall costs without changing the activation threshold. The cost term is bounded as the obstacle avoidance HC (11) already prohibits critical separation distances. The potentials for obstacle avoidance and self-collisions are finally given by: Please note that j,O > j,O and j,L > j,L for (15) to be effective.

Direct optimal control-full discretization
Continuous-time optimal control problems like (3) are solved either by indirect or direct methods. 48 Indirect methods apply the calculus of variations to obtain the optimality conditions regarding a boundary value problem which is then solved numerically. In contrast, direct methods transform the continuous-time problem into a nonlinear program using discretization respectively transcription methods. Model predictive control problems prefer direct methods due to a more robust convergence behavior and availability of efficient and generic implementations of solvers. In the following, the direct method is applied to transform (3) into an approximating nonlinear program. Based on the benchmark results in our previous work, 44 we apply a full discretization method in favor of multiple shooting or Hermite-Simpson collocation since it reveals a more sparse structure that is fully exploited by the hypergraph in Section 3.4. Full discretization is a particular case of direct collocation or multiple shooting in which both controls and states are uniformly discretized at identical grid points: Furthermore, let x 0:K and u 0:K−1 form discrete trajectories: in which x k ∶= x( k ) and u k ∶= u( k ). The approach assumes fine grid partitions Δ such that the upper sum rule approximates the integrand of running costs: in which̃∶ R NM × R N × R N → R approximates (5): If k = 0 then u −1 copies u 0 of the previous optimization or 0 in case of the first run. Rösmann et al 44 approximate the system dynamics by forward differences in the full discretization case. In here, integration operates with midpoint differences since they achieve a better approximation without a significant increase in computation time: The state equality constraint in (3c) without input delay is then mimicked by: Since dead time is compensated separately in Section 3.5, the nonlinear program is defined without time delays. The formulation is not restricted to this particular choice of ∶ R NM × R NM × R N → R NM , other collocation kernels or numerical integrators apply as well. The resulting nonlinear program is given by: subject to F I G U R E 3 Hypergraph for nonlinear program (22). Vertices are depicted as circles and hyperedges as rectangles. Vertices x 0 and 0 with a double circle are fixed during optimization

Hypergraph formulation
The nonlinear program (22) is solved by standard gradient-based constrained optimization algorithms, for example, interior-point solvers, sequential quadratic programming, or projected gradient methods. 49 Most model predictive control approaches refine the second-order derivatives iteratively by the Broyden-Fletcher-Goldfarb-Shanno algorithm, and first-order derivatives are computed either by automatic differentiation or from finite differences. Central differences constitute a balanced trade-off between accuracy and computational complexity as each constraint and cost term in (22) is evaluated only twice per parameter. The sparse finite difference computation with hypergraphs restricts the calculation of the constraint Jacobian to its structural nonzeros and only evaluates the relevant cost terms to obtain the gradient. A hypergraph (, ) is composed of a set of vertices  and a set of edges . A vertex v ∈  refers to an optimization parameter which is in case of (22) either a state x k or control input vector u k : The set comprises N + 1 state vertices and N control vertices that correspond to grid points along the prediction horizon. Each vertex caches inherent properties like box constraints as in h(⋅) of (22). Furthermore, a vertex is considered as fixed, if its parameters are not subject to optimization, but appear in cost or constraint terms. Tagging vertices as fixed, avoids the validation of trivial equality constraints such as (22b). In contrast to a regular undirected graph, a so-called hyperedge connects an arbitrary number of vertices. It refers to a scalar cost term as well as equality and inequality constraints at time instances k. Figure 3 shows the hypergraph (, ) of the nonlinear program (22) with the following abbreviations: For the sake of clarity, functions, and constraints that refer to the same set of parameters are combined into a single edge. But also from an implementation point of view, this might be of interest whenever multiple functions share similar precomputations, for instance, distance calculations. Vertex x 0 is fixed during optimization, and hence its unary edges are omitted.
Any solver that requires to compute the cost gradient and constraint Jacobian matrices needs to iterate the set of hyperedges in the graph. For every edge, only the dependent parameters are adjusted to apply finite differences resulting in small dense block Jacobian matrices. In case of the constraints, it is most efficient to directly memory-map the block Jacobian to the overall sparse constraint Jacobian. Embedding the optimization problem into the hypergraph significantly reduces computation time. The interested reader is referred to Rösmann et al 44 for more details.

Closed-loop predictive control
Algorithm 1 details the steps of model predictive control that solves the optimal control problem formulated as nonlinear program (22) at instances t n with cycle time T S = t n+1 − t n . The controller observes the current joint state ( q(t n )q(t n )q(t n ) ) T , the set of vertices  and the current obstacle set  as inputs. The set  might be empty at the very first invocation, but in subsequent stages, it warm-starts the solver to increase efficiency. Line 2 initializes the optimization parameters by a straight line interpolation of states. The initial controls are set to zero. Subsequent sampling intervals merely shift the previous solution parameters by one time step to the subsequent vertex. The first vertex is initialized with the most recent state vector, and the Nth vertex is extrapolated linearly from vertex N − 1. Line 3 generates and updates the set of hyperedges, connecting states, and controls with their respective cost functions and constraints. As obstacles enter and leave the safety zone according to (9), the number of edges for obstacle costs and constraints changes dynamically, forcing the hypergraph to update. A numerical solver computes the optimal control and state sequence for (22). Obstacles are considered static during optimization. The algorithm returns the vertex set  ⋆ containing the optimal parameter values for warm-starting in the next sampling interval as well as the optimized control sequence u ⋆ 0:K−1 . The latter is turned into a piecewise constant, continuous-time trajectory: Algorithm 1. Model predictive controller 1: procedure feedbackControl(, , q(t n ),q(t n ),q(t n )) 2:  ← UpdateVertices(, q(t n ),q(t n ),q(t n )) ⊳ Warmstart  Figure 4 outlines the timing of events during a control iteration for one joint and shows the effects of computation time T C as well as dead time T D,1 . The controller observes a measurement q 1 (t n ) and starts solving (22). The optimization ends at t ctrl after T C = t ctrl − t n and the controller commands the control intervention 1 (t ctrl ). After a dead time of T D,1 = t exec − t ctrl the control intervention becomes effective as̃1(t exec ). Usually, the computation time T C for a single optimization of the nonlinear program can be neglected in comparison to the control cycle duration T S . However, if the optimization problem becomes more and more complex or the required solver accuracy increases, computation time rises, and causes the controller to react to q 1 (t n ) rather than q 1 (t exec ) leading to a decreased control performance. Dead time T D,i has a similar effect as the net computation time. Thus both causes are considered as a single coherent delay ΔT. Grüne and Pannek 14 compensate input delays for model predictive control by extrapolating the conventional initial state x 0 for ΔT into the future by forward integration of (2): F I G U R E 4 The controller observes a new measurement q 1 (t n ) and starts solving the optimization problem. The optimized control intervention μ 1 (t ctrl ) is sent afterward, leading to a computation time of T C . After dead time T D,1 the control intervention excites the system, indicated bỹ1(t exec )

F I G U R E 5 Sketch of the Universal Robot UR10 manipulator that is used for practical experiments [Colour figure can be viewed at wileyonlinelibrary.com]
The optimization problem is then initialized with the corrected statex 0 . The state evolution and extrapolation rest upon cached control commands. Computation times vary with the complexity and convergence of optimization. Thus a median filter estimates delay T C over past observed delays.

EXPERIMENTAL RESULTS
This section introduces model identification and analyzes alternative model structures. Based on the analysis, the model that constitutes the best compromise between complexity and accuracy is selected and validated on the actual robot. The model predictive controller for online trajectory optimization is evaluated under task variations, in particular, a dynamic target pose and the penetration of the workspace by dynamic obstacles.

Robot manipulator
The model is identified and validated on a real Universal Robot UR10 collaborative robot with N = 6 joints, which is sketched in Figure 5.  Figure 6 illustrates the velocity tracking controllers step response for the joint velocityq 1 of the first joint. At step time t = 1 second, the response exhibits a time delay in the order of tens of milliseconds caused by both rise time and dead time.

Model identification
The response also displays an overshoot and a settling time in the order of several hundred milliseconds, which approves the second-order system with dead time introduced in Section 2.
The signals for identification and validation are amplitude-modulated pseudo-random binary joint velocity signals as proposed by Barker et al, 51 with a duration of 30 seconds with random step lengths of multiples of 1 second and random F I G U R E 6 Responseq 1 (t) of the first joint to a step change from 0.05 rad second −1 to 0.35 rad second −1 in desired angular velocity μ 1 (t) in which NRMSE denotes the normalized root mean square error between measurements q i (t) and simulations x 1,i ( ). Table 1 reports the identified poles p i,j with i = 1, 2, … , N and j = 1, 2 as well as the goodness of fit of the pure approximated control loop without integrator. For the sake of briefness, the table only reports data for one joint model, the qualities of fit are similar for the remaining joints, and subsequent evaluations also apply. Choosing a prediction model means a compromise between model complexity and accuracy. Lower order prediction models reduce the computational burden in the model predictive control framework and either help to increase the cycle rate or to extend the planning horizon. The state vector dimensions are 6, 12, and 18 for M = 1, M = 2, and M = 3, respectively. Without dead time, the delay is approximated by slower system dynamics as for models with dead time. This raises the model complexity to M = 3 to have similar accuracy (91.49 %) as dead time models. Dead times for M = 1 are higher than for M > 1 because in this case, dead time also approximates slow system dynamics. With modeled dead time, all systems have similar accuracy in the range of 91.73 %, which makes M = 1 the natural choice for a prediction model. The remaining model mismatch due to unmodeled dynamics is usually unproblematic given the closed-loop nature of model predictive control.

Controller parametrization and stability
In Mayne et al, 52 stability is enforced by imposing proper terminal constraints on the state, whereas Grüne and Pannek 14 prove stability under particular assumptions without introducing terminal constraints or costs. In a well-defined task, start and target configurations exhibit sufficient clearance to obstacles such that neither constraints nor costs for collision avoidance are active at start or target. Under this assumption, the problem reduces to an integrator system with quadratic costs and box constraints for states and controls. As shown by Grüne and Pannek 14 such a system is asymptotically stable for K ≥ 2, and thus for a free workspace, no additional stabilizing constraints and costs are required. However, as obstacles restrict the workspace in the transition from start to target pose, collision avoidance constraints become active, and the assumption by Grüne and Pannek 14 no longer applies. Grüne and Pannek 14 also mentions that for additional costs or constraints, the minimum horizon length might increase. In general, the obstacle motion is not known beforehand. As a consequence, the nominal horizon length K and the temporal length, encoded in Δ , are adjusted empirically. A large temporal horizon is required in case of obstacles for a preemptive trajectory refinement and to plan the trajectory beyond the obstacle, where objective costs are less again. The choice of Δ includes a compromise between the horizon time and the accuracy of system dynamics approximation in the prediction step according to (21). The step size becomes Δ = 0.1 s, since an integrator system has comparatively simple dynamics. A horizon length of K = 25 corresponds to a look-ahead time of 2.5 s and constitutes a viable compromise between computational effort and temporal horizon length. In-depth stability properties in the context of model predictive control, in general, are summarized in Grüne and Pannek, 14 Mayne et al. 52 In summary, the model predictive controller operates with the following parametrization: • IPOPT 53 with MA27, 54  Please note that additional parameters are specified in the experiments. The IPOPT solver belongs to the class of interior point approaches whose large scale capabilities are well suited for fully discretized collocation problems with many optimization variables.

Model validation
The validation scenario commands four consecutive joint space configurations that constitute a closed movement in which the first and final waypoint coincide: The control bounds are ±0.1rad second −1 for the first three joints and ±0.3rad second −1 for the remaining. The motion between the first two waypoints is restricted to joints 1, 2, and 3. The second part affects the joints 4, 5, and 6, and in the final segment, all six joints move simultaneously. Prediction accuracy is defined as the residual between predicted x and actual states q according to (28). The analysis assumes that currently planned and future executed controls coincide over the current planning horizon. This assumption is valid unless the motion is either disturbed or the target configuration emerges on the horizon. To better illustrate the different effects of dead time and computation time, the predicted state sequence in one case starts from a steady state, while in the other case, the robot is in motion. Figure 7 compares the predicted joint trajectory (gray solid line) and actual joint trajectory (black dotted line) of the second joint without dead time ( Figure 7A) and with dead time ( Figure 7B). The robot motion starts from rest, and the model predictive controller is commanded to a new target joint state. The black dot indicates the time instance t n of observation of the current joint state. The predicted trajectory starts at t ctrl indicated by the gray dot. The discrepancy between executed motion at t exec and the prediction that ignores dead time T D,2 becomes apparent in Figure 7A with a quality of fit of 96.83 %. Figure 7B illustrates the same motion with explicit consideration of dead time in prediction. The extrapolated initial state (gray dot) matches the system state at t exec improving the overall prediction quality of fit to 99.52 %. Figure 8 presents the same analysis, but in this case, during motion at constant velocity rather than starting from standing still. The additional effects of computation time delay become apparent in Figure 8A, causing a shift between prediction and execution with a goodness of fit of 97.52 %. The computation delay T C causes the control command to be and no longer valid initial state, indicated by the slight shift in −x 2 direction compared to the actual (dotted black) behavior (A). Also compensating computation time by extrapolating the initial state into the future further improves the prediction. The black dot indicates the measured state and the gray dot shows the first state of the prediction at t ctrl sent at t ctrl when the initial state for prediction no longer matches the actual joint state after optimization. Compensating the computation delay by initializing the optimization with a further extrapolated initial state, better matches prediction and execution ( Figure 8B) with an improvement in the quality of fit to 99.55 %. The mean computation times for solving the nonlinear optimization problem for the simple integrator are 6.69 milliseconds and for the integrator with extrapolation of initial state are 6.73 milliseconds. Considering that the 95 % confidence interval of the estimated mean is [−0.24 millisecond 0.15 millisecond] there is no statistically significant difference between both computation times.

Dynamic goal pose
This section investigates the closed-loop behavior for a dynamic reference configuration. This type of task variations are typical for pick and place operations with moving objects, for example, parts transported by a conveyor belt or object handover in human-robot collaborations. In many cases, the motion targets are defined in the workspace and are not necessarily present in the form of joint configurations. This experiment assumes that a moving target pose was previously transferred to the joint space, also taking into account possible redundancies. With modifications to the cost function, the presented approach is also applicable for target poses in the workspace. This, however, leads to an economic model predictive controller that introduces other benefits and drawbacks which are not in the scope of this article. The robot starts from an initial configuration q = 0 and is initially commanded to the static reference configura-  Figure 9A illustrates the joint trajectories of the first and third joint and their corresponding references. At time t 1 = 9seconds, the first joint state q 1 already converged to the reference state q ref,1 , whereas the third joint did not complete its motion yet. Figure 9B illustrates the evolution of the joint state error. During the reference motion, the residual error settles to a nonzero steady state of 0.02

Obstacle avoidance
For the following analysis, the parameters for obstacle avoidance are set to: • Control bounds ±0.4rad second −1 for all joints The first experiment contains a single spherical obstacle that remains static during task execution, but whose location is allowed to change across tasks. The sphere of radius r 1 = 0.1m is located at p T 1 = ( 0.8 0.16 1.0 ) relative to the shown robot coordinate system in Figure 10. To improve experimental reproducibility and to eliminate the influence of object detection and tracking on the results, the obstacle is purely virtual rather than physical. Figure 10A with the obstacle blocking the straight line path in joint space. The straight line joint motion towards the reference pose is nonfeasible due to a self-collision at intermediate states. In addition to the presented plots, the attached video shows the robot and the virtual obstacle for better visualization. 55 In the following analysis HC refer to (11) and soft constraints (SC) refer to (15). Obstacle avoidance with mere HC results in a trajectory with minimal separation from obstacles. However, a limited number of solver iterations and inaccurate delay compensations for oscillating computation times increase discrepancies between predicted and actual joint states. These discrepancies then lead to constraint violations for self-collision (t = 2seconds) and obstacle avoidance (t = 10seconds) shown in Figure 11A. Even though the residuals of these violations are small, they cannot always be resolved within the forthcoming control cycle and lead to nonfeasible solutions of (22). In consequence, the controls during the motion along the obstacle become nonsmooth, as shown in Figure 11B. They become smooth again once the robot's joint motion is no longer constraint by self-collision and obstacles. Although this variant does not cause collisions, the nonsmooth control commands are not desirable in robot applications. SC introduce repulsive behavior before the activation of HC and lead to more robust planning alongside obstacles with smooth controls. The obstacle penalty increases with proximity to the obstacle and causes a compromise between trajectory length and clearance from obstacles, as shown in Figure 11A. Figure 11B shows smooth controls of the first joint that allow practical application of online trajectory optimization under real-time constraints. The compromise trajectory solution under SC mainly depends on the relative weights of trajectory and terms in the cost function. Thus, it is not possible to guarantee compliance of constraints in general, and therefore, HC still provide a fall-back safety level.
The second experiment considers three moving obstacles traversing the robot's workspace one after each other. Figure 12  and move 10.0 m in −y direction for 50 seconds with constant velocity perpendicular to the robots planar 0 configuration. They disturb the free space robot motion, which is equal to the last experiment. Please again also refer to the corresponding video for a qualitative impression. 55 Even though trajectory optimization does not extrapolate the obstacle motion into the future, the minimum distances of 0.034 m and 0.051 m for self-collision and obstacles, respectively, are respected. SC introduce environmental perception, which offers the possibility to already adjust planning for nearby but not yet critical obstacles. This is particularly important in this dynamic scene. Figure 13 collects the control sequences of all joints and demonstrates their smoothness. Some joint motions come to a temporary halt and resume their movement once the obstacle passed. If this occurs for all joints at the same time, planning is trapped in a local minimum where stopping is supposed to be optimal instead of circumventing the obstacle. The next experiment will deal with this case for a more complex obstacle in more detail. Another type of local minima becomes apparent during the second obstacle for t ≥ 26seconds. The robot plans to pass the obstacle on the left. However, since the obstacle is moving in the same direction, the first joint sheers off instead of avoiding the obstacle on the right. In this case, the local minimum leads to the target configuration but is not globally optimal. Estimating and forward predicting obstacle motions as well as an alternative initialization that considers different topologies could initiate a different behavior.
The third experiment replaces the former virtual dynamic obstacles with a real human obstacle. The person is modeled by a composition of seven bounding volumes using a sphere (head) and six cylinders (arms, thorax, and pelvis), as illustrated in Figure 14A. A marker-based motion capture system tracks the person's movement and reports it to the planning algorithm. Figure 14B, C visualizes the initial and final robot configuration for this experiment. The robot moves  and vice versa while the human periodically interferes. The control bounds are increased to ±0.5rad second −1 for all joints. Please again also refer to the corresponding video. 55 Figure 15 shows the trajectory of the first three joints. The remaining three joints of the wrist play a secondary role and are not shown. The three gray areas mark situations in which the robot stops due to local minima. In these cases, the person blocks the motion of the first joint while the other joints reach their target value. An evasive motion, for example, by folding the entire arm, would cause the converged joints to deviate from their target value and thus increase optimization costs. Also, the limited planning horizon cannot capture reaching the target values behind the obstacle, which would reduce costs. In consequence, the costs are higher for a motion around the obstacle than to stand still. As in the previous experiment, the controls are smooth ( Figure 16) and the minimum distances of 0.158 m and 0.067 m for self-collision, and the person are respected, respectively.

Computation times
This section summarizes the measured computation times of the previous experiments and gives an impression of the scalability of the approach. Table 2 reports the mean values, standard deviations as well as minimum and maximum values. In the second scenario, the measuring range is limited to the interval [7.3, 36.9] seconds, since this is when the avoidance motion occurs. The table also contains reference values that report the computation times of free space trajectory optimization without obstacles and collision avoidance. The standard deviation (16.07 milliseconds) of the HC variant is greater than that of HCSC (12.42 milliseconds) in the same scenario. This is a consequence of the aforementioned problem in Section 4.6 that violated HC cannot always be resolved. The attempts of the solver to resolve these, results in escalating calculation times. The HCSC variant achieves approximately the same mean value (29.09 milliseconds) as the HC variant (29.77 milliseconds) although it is more complex due to additional SC. SC increase complexity, but also eliminate the aforementioned escalations so that both effects compensate each other. In the second experiment, three moving obstacles are used, and up to two are simultaneously active. On the one hand, the optimization problem is more complicated due to at least two active obstacles and thus requires more computation time on average (34.27 milliseconds) and at the maximum value (70.18 milliseconds) than the HCSC variant in the first experiment. On the other hand, the lack of estimation and prediction of the obstacle's motion prevents warm starts from efficiently exploiting previous solutions. The third experiment emphasizes this effect at the mean (41.06 milliseconds) and maximum values (82.60 milliseconds) by the simultaneous presence of seven obstacles. However, local minima also make a significant contribution to the higher mean value and standard deviation (21.40 milliseconds). Figure 17 shows individual computation times during the third experiment and marked intervals in which local minima occurred. The intervals illustrate the consistently high calculation time at local minima. Since optimization costs are at a constant value above zero, the solver keeps high effort to reduce these costs, which succeeds only if the local minimum is resolved. Figure 18 shows, for comparison, the histograms of the calculation times for this ( Figure 18A) and the second ( Figure 18B) experiment. The third experiment shows a second cluster starting at 70 milliseconds, which is mainly caused by the local minima (cf. Figure 17). In the second experiment, no such cluster is visible in the upper region.
When obstacles enter or leave the workspace, new edges for collision avoidance are added, or old edges are removed in the hypergraph. For the cost function, only the calculations within the edge change. These relatively simple adjustments F I G U R E 19 Median-filtered computation time levels of three variants for a modified version of the second experiment are the advantage of hypergraph-based optimization, which, unlike methods that use code generation, eg, does not have to fundamentally redesign the optimization problem in case of structural changes. Figure 19 shows how the computation times of the HC, SC, and HCSC variants scale in a variation of the second scenario. In this scenario, the robot stays in a fixed configuration, and three obstacles pass by without collision. All values are median filtered to improve the visibility of changes in the level of computation times. Please note that these measurements only reflect the pure presence of obstacles and do not consider evading. An arrow pointing downwards indicates the timestamp at which an obstacle enters the robot's workspace. Analogously, an arrow pointing upwards marks a leaving obstacle. At those timestamps, an increase/decrease in the computation time level can be identified. In particular, the HCSC variant stands out from the other variants, which are evenly positioned. The analysis confirms the scalability of trajectory optimization with the hypergraph structure and adaptation of the number of constraints at optimization run-time without significant computational overhead. A detailed comparison of the hypergraph with other methods regarding preparation time can be found in Rösmann et al. 44

CONCLUSION AND OUTLOOK
The article proposes and analyzes a novel approach for online trajectory optimization of collaborative robots under task variations, self-collision avoidance, and dynamic obstacle constraints. The results of model identification and validation show that explicitly considering input delays like computation time as well as dead time allows for using six parallel integrators with the same accuracy of the prediction during closed-loop control as second or third-order models. The control scheme successfully converges to static target configurations and demonstrates a benign residual tracking error in case of moving targets. The analysis of the collision avoidance behavior during imitated robotic pick-and-place experiments shows that mere HC are not sufficient as they lead to nonsmooth control sequences. Enhancing obstacle avoidance via SC that grow quadratically in the near vicinity of the obstacle enables smooth controls. HC provide a fall-back safety level in case of critical separation distances. Extending the analysis to multiple dynamic obstacles confirms these results. It shows the benefits from the efficient preparation as the hypergraph formulation adapts the number of constraints to the complexity of the scene without significant computational overhead. There is a measurable increase in computation time per obstacle; however, the experiment with a person modeled by seven obstacles has demonstrated the applicability in an authentic scenario. Nevertheless, the performance of recently very successful SQP solver, which can benefit even more from warm starts, will be evaluated in future work. The experiments show two types of local minima that arise from the nature of gradient-based optimization. One which causes the robot to stop in front of large obstacles and another one that leads to long detours when avoiding a dynamic obstacle on the wrong side. In the worst case, the approach, therefore, behaves like known reactive methods that pause motions in front of an obstacle. However, the experiments also demonstrated how the approach successfully avoids the obstacle and reaches the final configuration. Future work will, therefore, transfer the previous topology-based parallel planners for planar mobile robots by Rösmann et al 56 to robotic manipulators to plan topologically distinctive trajectories in 3D space that cover the global minimum. Also, the approach is extended to estimate and extrapolate obstacle movements and consider their future states in the trajectory optimization to further improve the avoidance behavior. While assuming constant velocities is a quick and straightforward improvement, usually movement uncertainties must be considered within the horizon that might lead to the freezing robot problem.