Robot Self-Calibration Using Actuated 3D Sensors

Both, robot and hand-eye calibration haven been object to research for decades. While current approaches manage to precisely and robustly identify the parameters of a robot's kinematic model, they still rely on external devices, such as calibration objects, markers and/or external sensors. Instead of trying to fit the recorded measurements to a model of a known object, this paper treats robot calibration as an offline SLAM problem, where scanning poses are linked to a fixed point in space by a moving kinematic chain. As such, the presented framework allows robot calibration using nothing but an arbitrary eye-in-hand depth sensor, thus enabling fully autonomous self-calibration without any external tools. My new approach is utilizes a modified version of the Iterative Closest Point algorithm to run bundle adjustment on multiple 3D recordings estimating the optimal parameters of the kinematic model. A detailed evaluation of the system is shown on a real robot with various attached 3D sensors. The presented results show that the system reaches precision comparable to a dedicated external tracking system at a fraction of its cost.


I. INTRODUCTION
I N 2018 the American Automobile Association published a report about the repair cost of modern cars with Advanced Driver Assistance Systems (ADAS), indicating that repairs are two to three times more expensive than for traditional cars [1].An extra charge, not only caused by having to replace the additional integrated sensors, but also by their calibration which requires dedicated equipment and specially trained personnel.In a similar manner Richardson et al. [2] performed a survey on camera calibration, comparing the achieved precision reached by laymen and experts.Their findings confirm the necessity for qualified personnel as the quality of a calibration heavily depends on capturing sufficient and evenly distributed footage of the used calibration object over the entire image area.While there are no similar studies available for robot calibration, it is reasonable to assume that the effects with respect to cost and required know-how are similar.
The biggest drawback of current calibration techniques is, however, that they rely on specialized equipment.While many approaches from literature are presented to be "autonomous" or "automated", they can only be used when additional calibration objects, markers and/or sensors have been placed next to the robot beforehand, i.e. dot [3] or checkerboard patterns [4], spheres [5], [6], [7], [8], pins [9], or other This manuscript is part of projects that have received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 870133. 1 All research was conducted at the Chair of Robotics, Artificial Intelligence and Real-time Systems, Technical University of Munich (TUM), Boltzmannstr.3, 85748 Garching, Germany (see http://www6.in.tum.de).
arne.peters@tum.despecifically designed calibration targets [10], [11].Therefore, rendering it impossible to re-calibrate already deployed systems in unpredictable environments, such as traffic, households or catastrophic scenarios.Though calibration often wrongly treated as a once-in-lifetime action, the system parameters are changing over a system's usage period, caused by wearand-tear, maintenance and repairs, changes in temperature or mechanical stress i.e. caused by shipping or collisions.The resulting consequences of wrong parameters may range from small imprecisions, over task failure to a potential loss of an entire system, when deployed to a hazardous environment of which it cannot escape by itself anymore.
To overcome the aforementioned issues this paper presents a framework allowing true on-site self-calibration of a robot system equipped with an arbitrary eye-in-hand 3D sensor.Instead of using external utilities it is based on point cloud registration techniques to fuse multiple scans of a given scene.Our approach extends the Iterative Closest Point (ICP) algorithm to find the optimal parameters of a kinematic chain including all calibratable parameters of a robot manipulator as well as the hand-to-eye transformation (see Fig. 1).The key-contributions of this paper are: arXiv:2206.03430v1[cs.RO] 7 Jun 2022 1) To the best of the author's knowledge this is the first approach solving the calibration of an entire robot system by relying only on depth data instead of external tools and objects.
2) The presented framework allows calibration of any kinematic chain and depth sensor combination, e.g.single beam LiDARs, line scanners and depth cameras.3) A detailed evaluation is presented, comparing multiple real-world hardware configurations to a calibration performed by using traditional methods with a dedicated 3D tracking system.

II. FUNDAMENTALS
By today, various technologies for measuring 3D information can be found on the market.The most common ways to contactlessy estimate the distance between a sensor and a surface it is pointed at, are stereo vision [12], [13], [14], structured light (SL) [15] and the Time-of-Flight (ToF) principle [16].Depending on the used sensor it is further possible to perform multiple measurements simultaneously.According to the used sensor model the captured range or depth measurements captured at a moment t can be projected to a set of points with Each p i is a 3D point relative the sensor's origin.As the the sensor is assumed to be mounted in an eye-in-hand configuration, its origin also forms the end effector (EE) frame E of the robot's kinematic chain.In the following work, coordinate frames of points will be denoted in superscript, calligraphic letters, such as in p E i .When moving the robot the relative pose of E to the robot's base B changes.Thus, one possible strategy for fusing multiple point clouds can be computed from the kinematic parameters k of the robot and its joint states j t at the time t at which the scan was recorded: so that p While j t is simply the sequence of the joint positions along the used robot from base to EE the definition of k is not as straightforward: A kinematic model suitable for calibration should be continuous and represent a complete set of Degrees of Freedom (DoF) while staying nonredundant.The probably still most famous way of modeling robot manipulators is by following the Denavit-Hartenberg (DH) convention [17].As modeling a kinematic chain with 6 DoF per transformation includes several redundancies (e.g. it would be possible to shift a rotation joint along its axis) the DH convention defines joints to move along their local z axis and uses only four parameters φ n , d n , a n and α n per segment n, where φ n is the rotation around z n−1 , d n the translation along z n−1 , a n the translation along x n and α n the rotation around x n .
Unfortunately the DH model suffers from multiple drawbacks: It is neither complete nor parametrically continuous.To overcome this issues Stone [18] suggested to use two additional parameters b n and γ n per segment in his S-model, making the model complete, but not parametrically continuous at the cost of introducing redundancies.
Zhuang et Al.presented an alternative approach named Complete and Parametrically Continuous (CPC) model [19], which they later refined to the more intuitive Modified-CPC (MCPC) model [20].Similar to the DH convention the MCPC model assumes all joints to move along their z axes.It is constructed for each segment of the robot's kinematic chain, connecting two joints J i and J i+1 , by rotating the frame J i around its x and y axes to align it's xy-plane with the one of J i+1 and then shift it along the new x and y axes to position the new origin on the z-axis if J i+1 .
The MCPC model defines tr(k, j t ) to be the product of alternating transformations along static segments and joints: where s i are the parameters of the i-th joint.The MCPC model uses a total of four parameters per revolute joint, two per prismatic joint and six DoF for the transformation between the last joint and the robot's EE.For revolute joints, each segment is defined by s = (α, β, x, y) T so that: where rot(a, α) is a rotation of α around axis a and trans(x, y, z) a translation along (x, y, z) T .u x denotes the unit vector of a coordinate frames local x axis; u y and u z for the y and z axes accordingly.For prismatic joints the parameters x i and y i are treated as zero.
One special case it the transformation between the last joint and the EE, which has two additional degrees of freedom γ and z, so that st(s III. RELATED WORK The presented approach combines techniques from two different fields of research: 1) Point cloud registration, which is commonly used in computer vision, e.g. for 3D reconstruction and/or simultaneous localization and mapping (SLAM) as well as 2) solving the hand-eye and/or robot calibration problems, where especially the second one is more common the field of control engineering.As such the state of art in both fields will be presented separately.

Regardless of its 30
th anniversary, the widest known approach for aligning two point clouds is still the ICP algorithm, which was developed independently by Besl and McKay [21], as well as Chen and Medioni [22] in 1992.It aims to find the rigid transformation to align one point cloud ("data") with a second one ("model"), by iteratively searching pairs of closest points between both clouds and optimizing the initially guessed transformation by minimizing the distance of all pairs.While the general idea of both ICP versions is the same, Besl and McKay used the squared Cartesian distance of matching points as an error measure, while Chen and Medioni optimized the point-to-plane distance, which was later shown to reach a faster convergence [23].
Over the last decades numerous variants of the ICP algorithm haven been developed, using different strategies for point matching, introducing an additional validation step for point matches, and/or varying metrics as well as optimization techniques.Detailed overviews of shape matching are given in the survey papers [24] and [25] from 2015.An older survey of Rusinkiewicz [23] from 2001 even performed a benchmark of different ICP variants.He also suggests that Iterative Corresponding Point might be a better fit for the ICP acronym, since many other matching criteria (e.g.features or backprojection) than pure geometrical distance have been shown.
Two interesting and more recent extensions of the ICP algorithm come from Segal, Haehnel and Thrun [26], which formulated a plane-to-plane distance function as well as from Rusinkiewicz [27] introducing a symmetric objective cost metric.Other recent works try to extend the ICP algorithm to support non-rigid shape matching [28], [29], [30] or get rid of the required initial guess [31], [32].On top researchers have applied deep learning for one or multiple steps of the algorithm [33], as well as to solve the problem of point cloud registration solely by machine learning [34], [35].

B. Calibration
Calibrating a robot system with a hand and an eye can be broken down to three separate problems: 1) Intrinsic calibration of the optical sensor, 2) finding the transformation between the actuator and the eye and 3) calibration of the actuator itself.A partitioning already made by Tsai and Lenz who presented a series of papers solving each step separately [36].
1) Sensor Calibration: The issue of sensor calibration is naturally depending on the sensor and commonly treated as a standalone problem.Even works combining approaches for solving entire system calibration at once, usually treat sensor calibration as a independent step in the overall calibration pipeline (such as [37], [38], [39]).
Sensor calibration must obviously fit the used device.Especially camera calibration has become a standalone research field.A early approach to classic camera calibration was the Eight Point algorithm [40], which only became stable after introducing an additional normalization procedure [41].Another widely used calibration approach is the method of Zhang [42].Later works introduced more complex sensor models i.e. by including additional parameters for modeling radial distortion [43], [44].
Also in the context of 3D perception more complex sensor models are required.For stereo vision systems the camera parameters of both sensors as well as the calibration between them needs to be known, while in SL and ToF systems one needs to consider the projector instead of a second camera.In [37] the latency of a ToF sensor's projector is calibrated and [45] demonstrates that the projector of a Kinect v1 sensor can be modeled and calibrated with in a similar fashion as a camera.Finally [46] and [47] investigate the calibration of rotation multi-beam LiDAR sensors.
More recent works tried to remove the requirement for special targets: [50] calibrated the eye-to-hand transformation by measuring generic planes, while [51] used straight edges of random objects.Heide et Al.[52] estimated the pose of external LiDAR scanners by detecting a CAD based 3D model of an excavators arm in the recorded point clouds.Sheehan et Al.[53] managed to intrinsically self-calibrate a multi-LiDAR scanning device without an knowledge or requirements to the environment.They defined a crispness error metric based on squared Renyi entropy, optimizing multiple overlying scans for crisp edges.In [54] and [55] Alismail et Al.solved the intrinsic calibration (4 DoF) of a self-build 3D LiDAR made from a rotation 2D scanner, by applying ICP on data from the first and second half of a single rotation.We recently extended this idea to show that extrinsic calibration of an eyein-hand LiDAR is possible via fusing two 3D scans, taken at random manipulator configurations by rotating the wrist joint [56].A similar idea was later on presented by Li et Al.where Particle Swarm Optimization -Gaussian Process (PSO-GP) was used to fuse data of a close-range eye-in-hand line scanner, as ICP "is difficult to directly apply to the calibration of line laser sensors because the line laser sensors do not have the enough scanning range" [57, Page 2]-Quite in contrary to the findings of this paper: The experimental observations presented in section V actually indicate, that the calibration precision is even higher when using ICP on smaller and less complex scenes.
3) Robot Calibration: While target-less calibration eye-tohand calibration of depth sensors has been demonstrated, there are no standalone solutions for entire robot calibration yet.General overviews to the problem of robot calibration are given in [58] and [59].Examples for pure robot calibration are rather rare: Bennet and Hollerbach [60] create a loop closure in a kinematic chain, by connecting two robots of the same type at their EEs, while [61] and [62] use a Contact Measuring Machine (CMM) and a precisely manufactured reference fixture to measure the position of the EE.
Another series approaches is using external, optical measuring systems.These works usually treat the transformation between the EE and the sensor and/or markers as yet another segment of the unknown kinematic chain and include it in the calibration problem, e.g. by using a theodolite and a reflector mounted to the robot's EE [63] or camera/marker based 3D tracking systems [64], [65].[39] calibrates the manipulator of a humanoid-like robot by watching a marker on its wrist.Ma et Al.[66] demonstrated a first makerless solution by using deep learning to guess both, a manipulator's kinematic model and configuration from watching it with an external camera.
Finally, there is a number of works solving robot calibration by using eye-in-hand devices.The approaches tend to follow a similar approach as for eye-to-hand calibration.A calibration target is recorded from a number of manipulator poses.Using the constraints that neither the target, nor the robot's base have moved it is possible to to calculate the optimal parameters for the kinematic model.Even the used targets are often the same: [67] and [68] use a checkerboard while [69], [70] and [71] use one or multiple spheres.[72] calibrated a robot with an attached line scanner by following a spanned string and in [73] the robot system detects markers at its joints by watching itself in a mirror.

IV. APPROACH
The aim of this work is to find the optimal parameters k opt describing the kinematic model of a robot.In a first step multiple scans of the robot's environment are recorded by moving a depth sensor attached to its EE.By exploiting the knowledge about the robot's design and configuration at the time of scanning, one can transform all acquired point cloud data to B. In a static environment all overlapping points from the performed scans must match to the same surfaces.Thus any errors in the fused reconstructions must originate from errors in the projection of measurements along the kinematic chain.By formulating a cost function to express the quality of matching points, we can use the ICP algorithm to minimize the projection error and thus estimate k opt .

A. Data Structure and Notation
Lets consider a robot with an actuated depth sensor, i.e. in an eye-in-hand configuration.As the used measuring technique and lens model may differ, for the scope of this work the sensor is assumed to be intrinsically calibrated and to produce a set of points P .While it is negligible how the data was measured, it is essential to known when each depth value was obtained.Depending on the sensor type one obtains a different amount of data at a time t.In detail, a single beam LiDAR only measures a single range value, a triangulation based line scanner captures a vector of points and a depth camera even returns an entire matrix of measurements.Thus, depending on the used sensor type the elements of P can be rearranged to form a matrix P : and Recording multiple scans while moving the robot finally combines multiple P to a dataset tensor D: and Rotating LiDARs provide one special case, as they also measure a line but sequentially.To maintain the spatial order of points in a LiDAR scan, all l points from a single rotation are arranged column wise, so that For all measurements P t there are matching vector with the robot's joint states j t .The issue of different measuring frequencies can be overcome by interpolation of the joint positions to find an approximation of the exact state for t.As a result there is a matching a matrix matching each D i , or a tensor for each D LiDAR i , respectively.For the following formulations the recorded data will be denoted as D, regardless of the used sensor type.While the tensor D Depth Camrea has a rank of three (four if one counts each point's x, y, z coordinates), I use the two-index notation p i,j as an equivalent to for simplification.In the same way I use j i,j to address the joint states matching a point p i,j , even though J may have a different rank than D.

B. Parameter Modeling
I use the MCPC model to express the spatial relationships of the robots kinematic chain.An initial guess k init of k opt can i.e. be obtained via manual measuring or from the robot's datasheet.Though the model MCPC itself is free of redundancies there are still a few parameters which cannot be calibrated.The transformation between B and the first joint of th robot J 1 behaves as a static offset to all recorded point clouds and there is no information to deduce its parameters.Since the frame of the first joint J 1 is lacking a fixed position in space, there further arise two redundancies in the parameters of T J 2 →J 1 .As such the y and β also have to be excluded from the optimization.I thus use a bitmask vector m of the same size as k with to define which parameters shall be taken into account in the optimization process.m is also used to reduce the number of DoFs for prismatic joints.

C. Normalization
The influence of orientation errors on the cost metric heavily depends on the distance of surface to the sensor.To balance the weight of angular and translation parameters k should be normalized before optimization.Since all data is captured by the robot looking at it's environment, one can assume the data to be more or less equally distributed around the robot's base.Thus only approximating the scaling factor s from a subset of all datasets is usually sufficient.I project all points of the first recorded dataset D allowing to compute s by I denote components normalized by s with a hat symbol (as in p).The normalization of a dataset D E is straightforward: Note that one also needs to normalize the translation parameters of as well as the joint positions of prismatic joints.Even though the parameters of k change with every ICP iteration, it is safe to keep s constant over the whole optimization process.As the accumulated error of k init is usually in the range of a few centimeters the numerical effects on s are neglectable.
In case the initial assumption of an equal distribution of points around the robot does not hold, it might be necessary to a) compute s by averaging points from all datasets, as well as b) to add an additional translation T B→C shifting the points to their average centerpoint, which would need to be incorporated as an additional transformation in tr(k, j) and masked in m.

D. ICP
The used ICP algorithm uses four steps which I will describe in detail below: 1) Initial projection and validation of points, 2) search for point matches, 3) verification of point matches and 4) computation of the error function and optimization of k.All four steps are repeated iteratively until convergence is reached.A detailed overview of the used operations is given in algorithm 1.

Algorithm 1 ICP for Calibration
Require: Note that the projection of DE to DB is the result of a transformation along a kinematic chain, parameterized by joint states that change over the duration of the recording.In other words: Even though the individual transformations based on k are rigid for themselves, errors in the kinematic parameters will lead to non-linear distortions as shown in fig. 2.An effect making it hard to apply common tools often used in point cloud registration, as features, tree-structures for point matching and normals cannot be pre-computed.This is an especial limiting factor for the cost function, as numeric optimization of k leads to an ongoing deformation of the resulting point cloud.I thus use the cross product for normal computation and the point-to-plane error metric as a compromise between runtime and precision.Since n i,j is normalized, it is the same whether it is computed via n(p i,j ) or n( pi,j ).Note that for scans other than depth images, the normal orientation depends on the scanning trajectory and the resulting order of scan lines (e.g.whether the lines were recorded from left to right or from right to left).It may thus be necessary to verify the normal orientation by comparing it to the direction of the sensor's view ray: where the origin of the sensor o It is natural for sensor data to contain noise and outliers, as well as in the context of depth sensors gaps.Moreover, the chosen approach for normal estimation is prone to make incorrect assumptions along corners and edges.An affect becoming especially eminent when the density between separate scan lines and points within a single scan line heavily differs (see fig. 3).To improve the quality of the calibration result it is recommendable to remove such measurements from the recorded data.
While the detection of invalid points (usually indicated by NaN values) is straightforward, the classification of outliers and edges requires an additional filtering routine.A window of n × m radius is applied to each projected point 2) Point Matching: For each pair of filtered datasets Since-other than the original ICP-bundle adjustment is used to align more than two scans at once, the process is repeated for all possible combinations of datasets (without respect to the order).
3) Match Validation: Unfortunately pure nearest neighbor matching is prone to finding incorrect pairs, e.g. when a point qB ∈ QB i lies in on a surface not captured in QB j .Thus all matches with a point-to-point distance above a maximum distance dmax and a normal overlap below a threshold f min are excluded from M (similar to [75]).

4) Cost Function:
The point-to-plane distance of a match ⟨q resulting in a total error e( k, M ) = Since the normals are only used as part of a squared dot product, one can safely ignore its sign in the computation of   the error.The Levenberg-Marquard method [76], [77] is used to find the optimal, unmasked parameters of kopt minimizing The final calibration parameters k opt can then be obtained by removing the scaling factor s from kopt : All four steps of the ICP algorithm described above are iteratively repeated until ∆k opt between two iterations reaches below a threshold .

V. EVALUATION
The aforementioned formulations yield in a generic framework, theoretically enabling the calibration of any kinematic chain with an attached 3D sensor, as long as it is possible to get precise readings of the included joint positions.To demonstrate the capabilities of the proposed system it is tested on a seven DoF robot arm in combination with varying 3D sensors.In detail, I will use a KUKA LBR iiwa R840 manipulator (see figure 5) and a total of four sensors with different characteristics: A Hokuyo UTM-30LX rotating single-beam LiDAR, a Wenglor MLSL236 triangulation based line scanner, a Microsoft Kinect Azure consumer grade depth camera and a PhotoNeo MotionCam 3D high end industrial grade depth camera (see figure 4).
The used LiDAR has by far the widest view range of all tested sensors, but is only meant for far range scanning.
Moreover the studies shown in [78] suggest that the sensor error may be modeled as absolute Gaussian noise with a standard deviation of 1.8 cm.
In contrast the Wenglor MLSL236 is extremely precise, but in its view range limited to close range scenes.Unfortunately there are no studies about its sensor model available.However, the vendor's datasheet specifies the the maximum depth error to stay below 600 µm [79].Given the three-sigma-rule one thus may assume a Gaussian noise model with σ abs = 0.2 mm.
Microsoft's Kinect Azure is the only consumer grade 3D sensor in the test field and with a recommended retail price of around 400 EUR by far the cheapest one tested.It is also the only sensor suitable for both, close and far range scenes.The Kinect offers a narrow field of view (NFOV) as well as a wide field of view (WFOV) scanning mode, both which can be combined with an additional 2×2 binning, which increases the z-precision at a cost of the xy-resolution.All datasets of the Kinect Azure were captured in NFOV 2 × 2 binned mode.According to the findings of [80] the sensor's noise model for this mode is similar to linear Gaussian noise modeled by with σ rel = 0.21 %, σ abs = 2.53 mm and z to be the distance of a particular measurement.Finally the PhotoNeo MotionCam 3D is a high end industrial grade depth camera for scenes between 50 cm and 1 m distance.It can be seen as a successor to PhotoNeo's PhoXi 3D Scanner, adding support for dynamic scanning at up to 20 Hz.As the used sensor technology is very similar to the PhoXi 3D Scanner, one may take the observations presented in [81] as a point of reference for modeling the sensor noise: Averaged over four different materials σ abs has a value of 0.18 mm.
J 7 E Fig. 5. Physical joint frames of the used LBR iiwa R840 robot.All axes rotate around their local z-axes.The base frame B for the kinematic chain is equivalent to the frame of the first joint J 1 .When applying the MCPC modeling convention the joints J 7 , J 5 andJ 3 are shifted along their z-axes to the positions of their predecessors.Setting the non-calibratable parameters y and β of T J 1 →J 2 to zero also shifts J 1 (and thus also B) up to have J 2 lying in its xy-plane.
An outline of the tested sensors' characteristics, as well as the used parameters is given in table I.
The evaluation is performed on two close range scenes of basic objects-a 3D printed Stanford Bunny (originally scanned and made public by Turk and Levoy using their back then new scanning algorithm [82]) and a Utah teapot (also known as Newell Teapot) [83], [84] which is still in production by its original manufacturer-as well as two large scale scenes, featuring an office and a laboratory at TUM (see figure 7).
The Hokuyo LiDAR has an extensive reach and a view angle of 270°making its orientation neglectable as it is almost always capable of seeing a surface somewhere within any indoor scene.Thus recordings were performed by moving the robot to seven randomly generated configurations and moving one of it's axes by 180°, each at a time.The process was repeated five times, resulting in four batches of with a total of 28 scans and one validation set with seven scans.The joint velocities were chosen in a way to obtain a similar density between scan lines and points within a single line.
For the other used sensors the devices' fields of view need to be taken into account when selecting the scanning poses, to assure sufficient overlap in the projected point clouds of the datasets.In these recordings the scanning poses were chosen by manually selecting a suitable sensor placements and using an inverse kinematics solver to find a matching robot configuration for the desired EE poses.In case of the Wenglor MLSL236 the trajectories were defined by a fixed start and end configuration in joint space.The robot was moved between  those configurations with a constant velocity to allow linear interpolation of the joint positions.For the two depth cameras the robot was moved to fixed positions from which only single images were taken.Figure 6 provides an detailed overview of the selected configurations and trajectories.
A reference calibration was obtained using an optical tracking system based on five Vicon Vero v1.3 cameras.For this calibration a marker was placed on the ground next to the robot's based while a second one was mounted to the EE.The robot was then moved to 5 000 randomly selected configurations while the poses of both markers as well as the robot's joint positions were recorded.90 % of those points were used to compute the optimal MCPC parameters connecting the static marker next to the robot's base O with the one attached to the end-effector E. Based on the found model, outliers in the measurements with an offset of more than 5 cm and/or 0.05 rad (2.86°) from the estimated EE pose were excluded from the recorded data.The procedure was repeated five times.Based on the remaining 500 poses the reference calibration reaches an position error of 1.77 mm and an orientation error of 0.547°, when applying the same outlier filtering on these measurements as well.
The initial model for both calibration attempts was obtained from CAD data of the used components whenever available.Parts for which such data was not accessible were measured manually.
As the presented approach takes the coordinate frame of the used 3D sensor as an endpoint of the kinematic chain, only the estimated MCPC parameters of the robot itself are comparable to the reference calibration.To enable an evaluation on the same 500 validation measurements, the found transformation between the seventh joint J 7 and E is thus replaced by the reference one.The same applies for the transformation from O and J 1 as well as the non-calibratable parameters between J 1 and J 2 .An overview of the calibrated parameters by the different approaches is given in table II.To ensure similar conditions the outlier filtering on the verification data is applied here as well.
For the final results, the maximum number of iterations i max was limited to 50.All retrieved calibration results are listed in table III.The runtime measurements were performed on a workstation PC equipped with an AMD Ryzen 9 5950X CPU (2020 model with 16 physical cores) and 128GB RAM, running a multi-threaded C++ implementation of the described framework under Ubuntu 18.04 LTS.Figures 8 and 9 further show the captured validation sets to allow a subjective impression of the achieved calibration quality.
The presented results allow for a number of observations and conclusions: As one can see in table III, using only seven scans is usually insufficient to reach a suitable calibration result.Either the precision is way worse than what can be expected from a higher number of scans, or the ICP did not converge at all.There is also an clear trend towards reaching a higher precision when using more data.O to J 1 J 1 to J 2 J 2 to J 3 J 3 to J 4 J 4 to J 5 J 5 to J 6 J 6 to J 7 J 7 to E α β γ x y z α β x y α β x y α β x y α β x y α β x y α β x y α β γ x y z ○ Parameters from CAD model / manual measuring • Calibrated using presented approach ◆ Calibrated using optical tracking system   Also, the findings confirm an expected relation between the the calibration result is the precision of the used sensor.For high precision sensors such as the Wenglor MLSL236 or the PhotoNeo MotionCam 3D the results are similar to the ones obtained by the tracking system.In case of the MotionCam 3D on the Stanford Bunny scene the proposed framework even found a solution which is slightly better than the reference.However, even with a Kinect Azure one may obtain results almost as good a as reference, even though its cost are less the one fiftieth of the tracking system's one.
The observations further raise the suspicion that less complex, small-scale scenes are better suited for calibration.Especially in combination of large scenes captured by far range sensors even smallest orientation errors have a strong effect on the overall error metric.On the example of the lab scan taken with the Hokuyo LiDAR one can observe that the orientation error is continuously reduced with an increasing number of used datasets, even at the expense of the position error.
Evidently the runtime is directly dependent on the number of used iterations and found point matches-which is again related to the number of scans.Fortunately one can see, that the number of required iterations goes down when the overall number of datasets increases.Finally, while recording the used datasets I made another observation: The spatial position of the sensor is-except for ensuring sufficient overlap in the scan data-neglectable.It is, however, desirable to reach a complete and uniform sampling of the manipulator's joint space.Especially extrapolations in joint ranges not covered by the calibration data will result in errors.When closely studying figure 6 one can see a direct relation between the coverage of the joint space and the calibration precision.

VI. SUMMARY
A new framework for fully autonomous calibration of robot manipulators has been presented, by extending the ICP algorithm to allow optimization of complex kinematic models instead of estimating a single, rigid transformation.The shown implementation has been evaluated on multiple real world scenes on various hardware configurations.Comparing the results to a dedicated tracking system has clearly shown the functionality of the framework.More than that: The achieved precision is similar to the one of the way more expensive reference system.Given the right scene, even with a Microsoft Kinect Azure consumer grade depth camera one can achieve a precision that is only few tens of millimeters off.
Having shown that self-calibration of robotic system is possible there are still multiple possible extensions to the formulated approach to be investigated: As the system is already capable to undistort scans obtained from projections along a badly parameterized kinematic chain it should also be possible to include a sensor's intrinsic parameters in the optimization.Also finding formulations to deal with less precise actuator readings such as odometry would allow to further grow the area of possible applications.Finally, many strategies for optimizing the runtime of the ICP algorithm have been demonstrated.It is not unlikely that many of those are applicable in the context of a calibration problem as well.

APPENDIX: ACKNOWLEDGMENT AND IMPLEMENTATION DETAILS
Special thanks goes to Daniel Hettegger and Bare Luka Žagar for their continuous assistance in the lab and their remarks to the contents and structure of this work.This also applies to Dinesh Paudel who was a great help in the assembly of the used robot workcell.
The presented C++ implementation would not have been possible without many contributors of open source libraries, most importantly Eigen [85] and Ceres Solver [86].Also great thanks to Salvatore Virga for publishing his iiwa_stack [87].

1 2 3 Fig. 1 .
Fig. 1.Visualization of the proposed calibration pipeline: 1) A robot with an attached 3D sensor captures multiple recordings of an arbitrary scene by moving the eye-in-hand sensor.2) ICP based bundle adjustment is used to undistort and align the point clouds by optimizing the robot's kinematic model.Note how the crispness improves on the bunny's ear and crib.3) The obtained calibration parameters are uploaded to the robot manipulator.

E 1 to
B to obtain a tensor of points D B 1 with

Fig. 2 .
Fig. 2. Distortion of point clouds obtained by projecting depth measurements of a pyramid along an imprecise kinematic model with randomized scanning trajectories.From left to right: One single point per joint configuration (single-beam LiDAR), one line per configuration (line scanner) and a matrix of points obtained from a single configuration (depth camera).

Bt.Fig. 3 .
Fig. 3. Scan lines of a cubic room, colorized by the direction of the points' normal vectors.For points along corners some of the neighbouring points lie on other surfaces, causing the normals to bend toward edges.The effect becomes visible as a color gradient on the scan lines.Areas with uniform normals are rendered in lighter colors.1) Projection and Validation: Before searching for point matches, all datasets DE i have to be projected to B: ∥v i,j ⋅ v i+a,j+b ∥ (2n + 1)(2m + 1) (29) and exclude all points with a value o( pi,j ) below a threshold g min .By choosing different values for n and m it is possible to compensate for varying densities of points and scan lines.As the indices of points are the same in DB and DE one can also remove the according measurements from the raw data.The filtered point clouds are denoted QB and QE , respectively.
1080 range values are recorded sequentially during rotation b) NFOV 2 × 2 binned mode c) Dynamic mode d) According to datasheet.However, experiments have shown that the used objects were already measurable at less than 20 cm distance.

Fig. 6 .
Fig.6.Scanning poses and trajectories used per sensor.Colors are defined by batch, while the line style allows to identify a single configuration.Areas outside of the joint axes motion ranges are grayed out.The scans of the first batch were used for the test runs with seven datasets.Datasets from the other batches were added subsequently to increase the overall number of datasets.

Fig. 7 .
Fig. 7. Pictures of the scanned real world scenes.1) Office, 2) TUM laboratory (the robot is positioned is in the workcell on the right), 3) Standford Bunny and 4) Utah Teapot (1.4 l version).

Fig. 8 .
Fig. 8. Point clouds projected by uncalibrated (left) and calibrated systems (right) on 28 datasets.Each pictures shows seven validation datasets that were not used for the calibration itself.From top to bottom: Stanford Bunny recorded with 1) Wenglor MLSL236, 2) Kinect Azure, and 3) Photoneo MotionCam, followed by Utah Teapot recorded with 4) Wenglor MLSL236, 5) Kinect Azure and 6) Photoneo MotionCam.The wooden warmer below the Teapot is not visible to the UV laser of the Wenglor scanner.Differences in the alignment of the point clouds of the Kinect Azure are best noticeable on the partly visible background structures.

Fig. 9 .
Fig.9.Point clouds projected by uncalibrated (left) and calibrated systems (right) on 28 datasets.Each pictures shows seven validation datasets that were not used for the calibration itself.From top to bottom: TUM laboratory recorded with 1) Kinect Azure, and 2) Hokuyo UTM-30LX (viewed from top), followed by an office recorded with 3) Kinect Azure, and 4) Hokuyo UTM-30LX.