ARDEA—An MAV with skills for future planetary missions

We introduce a prototype flying platform for planetary exploration: autonomous robot design for extraterrestrial applications (ARDEA). Communication with unmanned missions beyond Earth orbit suffers from time delay, thus a key criterion for robotic exploration is a robot's ability to perform tasks without human intervention. For autonomous operation, all computations should be done on‐board and Global Navigation Satellite System (GNSS) should not be relied on for navigation purposes. Given these objectives ARDEA is equipped with two pairs of wide‐angle stereo cameras and an inertial measurement unit (IMU) for robust visual‐inertial navigation and time‐efficient, omni‐directional 3D mapping. The four cameras cover a 240∘ vertical field of view, enabling the system to operate in confined environments such as caves formed by lava tubes. The captured images are split into several pinhole cameras, which are used for simultaneously running visual odometries. The stereo output is used for simultaneous localization and mapping, 3D map generation and collision‐free motion planning. To operate the vehicle efficiently for a variety of missions, ARDEA's capabilities have been modularized into skills which can be assembled to fulfill a mission's objectives. These skills are defined generically so that they are independent of the robot configuration, making the approach suitable for different heterogeneous robotic teams. The diverse skill set also makes the micro aerial vehicle (MAV) useful for any task where autonomous exploration is needed. For example terrestrial search and rescue missions where visual navigation in GNSS‐denied indoor environments is crucial, such as partially collapsed man‐made structures like buildings or tunnels. We have demonstrated the robustness of our system in indoor and outdoor field tests.


| INTRODUCTION
In recent years, MAVs have experienced an increased attention in the consumer market, in industrial applications and also in robotics research.
The consumer market is currently focused on products for aerial photography, competitive race flying and autonomous "follow-me" video operation for sports activities. Competitive flying has become so popular that it has evolved into a professional discipline with a dedicated league, the Drone Racing League (DRL) 1 . Recent technology in simultaneous localization and mapping (SLAM) and volumetric mapping have additionally paved the way for MAVs to be used in virtual reality (VR) and augmented reality (AR) applications, where an important feature of such systems is a stereo camera rig with a large field of view (FOV).
Furthermore, MAVs have also become of interest for inspection and maintenance tasks in industrial applications. Example tasks include the detection of leaks in gas pipelines, or cracks in bridges, as well as the volumetric mapping of construction sites or areas of world heritage for documentation and further analysis. In search and rescue (SAR) scenarios MAVs can help to identify people in need, provide aid from above and build a disaster map for emergency response teams. Recently, MAVs have even been considered useful in production lines of the "Factory of the Future" (Augugliaro et al., 2014). A common requirement for all of the aforementioned applications is a means of visual sensing and a GNSS device that is used for global localization and navigation.
Finally, MAVs are also being developed for autonomous exploration and mapping of extraterrestrial bodies (Huber, 2016). They are ideally suited for scouting purposes, since they can quickly cover large areas of interest and reach places that are inaccessible for ground-based robots such as rovers. Planetary scientists have high expectations that flying vehicles could be applied for autonomous exploration and mapping of relevant areas such as lava tubes on Mars (Daga et al., 2009). Depending on whether there is an atmosphere or not, a classical propeller propulsion system or booster engines can be used interchangeably on an MAV platform, with only minor changes in the navigation software.
However, in addition to the specific hardware requirements for MAVs in space applications, there are two major challenges for the navigation software. First, in contrast to most terrestrial applications, no GNSS device can be used, which is why we rely on cameras and an IMU as sensor modalities. Second, the required level of on-board autonomy is much higher. While most commercially available MAVs are only partly autonomous, that is they are either directly controlled by a human operator, or the operator must be able to intervene at any time, unmanned space missions usually have communication round-trip times of more than a minute. For instance, the round-trip time for a signal from Mars to Earth and back is between 8 and 40 min (Mankins, 1987), depending on their constellation. This is too long for any intervention by a human operator and can render the exploring vehicle out of operation. Compared to passive exteroceptive sensors, for example, cameras, many active exteroceptive sensors, such as Radio Detection and Ranging (RADAR) and Light Detection and Ranging (LIDAR), are characterized by high energy consumption, heavy weight, and more difficult space qualification. Cameras are lightweight and capture an information-rich representation of the environment, which makes them ideally suited for mobile robots that have limited payload. In addition to navigation purposes, cameras can be used for higher level mission tasks, such as scientific inspection of the environment or even taking selfies. This was done by the Curiosity rover team (Maki et al., 2012) and provided a convenient means of inspecting the rover itself. Stereo cameras have been successfully employed on MAVs to obtain depth information in both indoor and outdoor environments (Barry & Tedrake, 2015;Gohl, Honegger, Omari, Achtelik, & Siegwart, 2015;Matthies, Brockers, Kuwata, & Weiss, 2014; M. G. Müller et al., 2018;Tomić et al., 2012). In the past these cameras have been space-tested and used in several planetary robotic systems (Maimone, Johnson, Cheng, Willson, & Matthies, 2006).
To solve complex scientific investigation and exploration tasks in an effective and efficient manner, a team of heterogeneous robots can be used to distribute specific tasks to specialized team members. Moreover, crucial skills can be distributed across multiple members of the team to reduce the danger of a single point of failure. Reusing modular software and hardware components across all systems additionally reduces the complexity and effort in designing a robotic team.
This motivated us to build ARDEA, shown in Figure 1. The MAV supports wide-angle stereo vision, runs all computations onboard and performs navigation functions autonomously. ARDEA was not built for one specific task in mind, rather its set of parameterizable skills can be used to assemble complex missions. The human operator or the robotic team can choose skills to perform a specific task and accomplish the overall mission. The skills are defined generically enough to be applicable to other robots in the team. Also, a skill should be intuitive to use and work robustly, so that even unexperienced operators can assemble new missions in a fast and efficient manner. Together with our lightweight rover unit (LRU; Schuster et al., 2017), which is equipped with a landing platform, the presented MAV forms a heterogeneous robot team.
The core navigation software components, such as visual odometry (VO), local reference filtering and SLAM are designed to operate on both systems, differing only in configuration. We emphasize that the design and development of space-qualified hardware is beyond the scope of this paper. Instead, we focus on the algorithmic design of visual-inertial navigation for MAVs. Furthermore, while the specific platform we present here requires an atmosphere for flying, we note that our navigation system can be used similarly on hovering vehicles with thrusters or even on ground rovers or underwater vehicles.
First, we discuss the current state of planetary exploration using MAVs, existing design concepts and state of the art navigation algorithms in Section 2. In Section 3, we describe general hardware and software design considerations and the resulting setup of ARDEA.
Next, we present low and high level autonomy software components in Section 4. In Section 5 basic MAV skills, the building blocks of missions, are described. After defining the system design and skills, we demonstrate its capabilities in indoor and outdoor field experiments in Section 6. Finally, Section 7 gives concluding remarks and addresses potential future work.

| RELATED WORK
A large body of literature is concerned with sensors and autonomy functions of MAVs. We begin with a discussion on research about building flying robots for planetary exploration. Contemporary works dealing with MAVs with a design similar to that of ARDEA are then presented. We then give a brief overview of the literature regarding each of ARDEA's crucial components: visual-inertial navigation, motion planning, and control.

| Robotic planetary exploration with MAVs
In the community of robotic exploration the idea of using some sort of MAV has emerged over the last years. Future rotor-based robotic vehicles will be able to fly on planets and moons with a sufficiently dense atmosphere, such as Mars (Huber, 2016) or Saturn's moon Titan (Lorenz et al., 2017). Such lightweight flying robots are envisioned to be able to travel distances of up to 5 km (Thangavelautham et al., 2014), which is enough to gain an overview and aid the navigation on accessible terrain of slower ground rovers that can carry heavier payloads and manipulators. The Jet Propulsion Laboratory plans on sending a small coaxial copter to Mars in the Mars2020 mission (Balaram et al., 2018) for scouting. Also the Global Exploration Roadmap (ISECG, 2018) states that the exploration of Martian lava tubes is of high scientific interest. This raises challenging requirements for robots which are capable of mapping these difficult to reach geological points of interest (POIs). An obvious choice for those requirements are propeller and rotor aircraft designs, however, also other types like fixed wings were proposed for planetary mission, as shown in Kuhl (2008).
Although these approaches are well suited to cover vast areas efficiently, they are infeasible for narrow cave exploration. In addition, the presented system needs a large starting and landing area and is powered by a propulsion system which has to be refueled. Another approach is to send out robotic flapping wing fliers of a bumblebee size (Kang et al., 2019) along with a ground rover unit to do collaborative exploration. All of the discussed designs are restricted to celestial bodies with an atmosphere. Without an atmosphere the flying robots would need a different propulsion system, such as thruster-based propulsion, which is out of scope for our research.

| Autonomous MAV system designs
Compared to other aerial vehicles such as helicopters, multirotor systems have simple mechanical designs which are highly customizable. The most common designs are based on quadrotor platforms (Schmid, Lutz, Tomić, Mair, & Hirschmüller, 2014;Tomić et al., 2012) with different rotor configurations. Depending on the application, for example, industrial inspection, surveillance, SAR or planetary exploration, different designs might be more suitable. The first design aspect is the MAV size and it is defined by the narrowest traversable operation space. This can be simply addressed by constraining the size of the vehicle or by changing the shape of the frame on the fly.
Those designs have one additional degrees of freedom (DOF) per frame arm and have been shown to either change the frame shape during flight in an adaptive morphology way (Falanga, Kleber, Mintchev, Floreano, & Scaramuzza, 2019) or by steering the motor thrust vectors in a way that the MAV can fly and hover in arbitrary orientations (Kamel et al., 2018) and therefore, pass narrow passages.
Another design aspect is concerned with safety in case a motor failure. Fail-safe robustness is achieved by adding redundancy to the propulsion system and failure detection and handling in the control software. In Michieletto, Ryll, and Franchi, 2018and M. Müller and D'Andrea (2014, 2016 suitable system designs for motor redundancy and necessary control strategies to handle motor faults are discussed. We discuss this aspect of our system design in Section 4.1.1. Sensor placement is a major design aspect resulting again from the operation environment properties. In literature most MAVs which are suitable for operating in indoor and outdoor environments are using cameras as their main sensor. They either use one monocular camera (M. Achtelik, Achtelik, Weiss, & Siegwart, 2011;Ok, Gamage, Drummond, Dellaert, & Roy, 2015;Weiss, Achtelik, Lynen, Chli, & Siegwart, 2012), a stereo setup (Matthies et al., 2014;Schmid, Lutz, et al., 2014;Tomić et al., 2012) or even multiple stereo setups (Schauwecker & Zell, 2014). To further enhance the FOV, (Schneider & Förstner, 2015) used a wide angle stereo camera configuration. Some approaches combine several exteroceptive sensors such as cameras and LIDARs. For instance, in Beul, Krombach, Nieuwenhuisen, Droeschel, and Behnke (2017) two 3D laser scanners are combined with three stereo camera pairs to achieve a larger LUTZ ET AL.
FOV for confined spaces such as warehouses. Our hardware design was mainly driven by having the widest possible unobstructed stereo camera FOV setup.

| Visual-inertial navigation
Visual-inertial navigation has received a great amount of attention in the last decades and several approaches have been suggested. These navigation approaches are crucial for a flying system, which cannot rely on GNSS. They can be roughly categorized into filter-and optimizationbased approaches. While optimization-based approaches can have advantages in terms of accuracy (Forster, Carlone, Dellaert, & Scaramuzza, 2017) by relinearizing the estimated state, they must solve challenges arising from the high frequency of inertial measurements.  Kuffner, 2001) or A* (Hart, Nilsson, & Raphael, 1968) algorithms can be modified to efficiently find collision-free paths in unstructured, dynamic environments. While they are advantageous for collision avoidance using discrete, noisy sensor data, optimization-based strategies can deliver optimal and feasible solutions w.r.t. a metric and system constraints. The 4 DOF flat representation of Mellinger and Kumar (2011) is widely used in the MAV community and is convenient for planning techniques, which can make use of locally optimized motion primitives.
The minimal snap trajectories proposed by Mellinger and Kumar were extended in Richter, Bry, and Roy (2016)  Recently, there has been use of combinatorial strategies, which leverage the exploration capabilities of sampling and graph techniques to provide feasible initial guesses for optimal programming or direct trajectory optimization. Nieuwenhuisen and Behnke (2015) showed that they can efficiently plan control-effort optimal trajectories using as A*-based search in an Octomap (Hornung, Wurm, Bennewitz, Stachniss, & Burgard, 2013) and a subsequent smoothing using the CHOMP algorithm. More recently, Uszenko, vonStumberg, Pangercic, and Cremers (2017) and Oleynikova et al. (2016) showed that they can use on-board state estimation and mapping data from RGB-D and visual-inertial sensors, respectively, to build potential maps from Octomap occupancy trees and perform fast, online replanning using different optimization techniques.

| Control
Control of MAVs has been an active field of research. The control is important to execute the previously planned trajectories. A fundamental overview of multirotor control can be found in Mahony, Kumar, and Corke (2012). In Mellinger and Kumar (2011), aggressive flight maneuvers are realized based on differential flatness and using an external motion capture system. This approach is extended in Faessler, Franchi, and Scaramuzza (2018) to account for first-order drag effects.
Another line of research focuses on nonlinear dynamic inversion, which uses the inverse of the dynamical model and generates a feedforward angular rate command based on a differentiable reference trajectory (M. W. Achtelik, Lynen, Chli, & Siegwart, 2013). The incremental version, incremental nonlinear dynamic inversion, uses stepwise updates of the control input and leads to commands and disturbance rejection on linear and angular acceleration level (Smeur, de Croon, & Chu, 2017).
However, angular acceleration measurements are usually only available numerically on an MAV; hence, they are noisy and require filtering. Our 518 | control strategy, in contrast to the aforementioned approaches, solely relies on VO. The classical cascade of position and attitude controller enables to manually fly the MAV in attitude control mode. We explicitly consider changes in the atmosphere and compensate for and distinguish between external contact and wind forces (Tomić, Lutz, Schmid, Mathers, & Haddadin, 2018).

| FUNDAMENTAL SYSTEM DESCRIPTION
In this section, we present the fundamental system setup of our MAV. First, in Section 3.1 the general hardware setup is described, including ARDEA's shape and electrical components. Here, we also discuss system design requirements with respect to navigation and exploration tasks for future planetary missions and present our design decisions. In Section 3.2 the various on-board sensors and their respective mechanical and electrical integration are described, and in Section 3.3 the propulsion system design is discussed. Finally, an overview of the low-level software is presented in Section 3.4, including operating systems, middleware and related components.
3.1 | General hardware and system software setup In previous works, we used commercially available MAV platforms with modifications to make them suitable for our required objectives and sensor equipment (Schmid, Lutz, et al., 2014;Tomić et al., 2012). As mentioned in these publications, adaptation of a commercial platform is a good starting point for autonomous MAV research, but comes with several disadvantages. These include sensor arrangement limitations due to the fixed frame structure as well as limited access to low-level control interfaces of components such as the propulsion system.

| Requirements
To make this platform suitable for restricted spaces such as caves in planetary exploration scenarios or indoor environments such as partially collapsed man-made structures in SAR missions, the main design objective was to employ two pairs of wide FOV stereo cameras for visualinertial navigation, without having propellers or frame components within the camera views. Besides the visual navigation aspect, the platform should also be suitable for control system research (Section 4.1.1), where access to low-level interfaces such as motor speed commands and telemetry are vital. Most commercial platforms do not provide such low-level interfaces, which renders them unsuitable for our control research.
Therefore, it was required to design a custom system.

| Sensor placement considerations
Different system designs were taken into consideration in order for the wide-angle cameras to have an unobstructed FOV and to have the IMU in the center of gravity (CoG) of the mechanical structure.
Moreover, mechanical decoupling of the sensors and propulsion system was a critical design criterion that motivated the separation of the system into two parts; a frame containing only the propulsion system and a navigation stack unit, as shown in Figure 2. All sensors, embedded computers and custom electronics are integrated into a stand-alone navigation stack as illustrated in Figure 3. Unlike the mechanical decoupling between frame and navigation stack, the IMU and stereo cameras must be rigidly mounted to each other to insure a high mechanical stiffness between them. This is critical because the translation and rotation between both sensors will be calibrated once and should stay constant during robot operation to ensure accurate visual-inertial navigation.

| Frame
The following shapes were considered as possible frame designs: Y, T, H, △, □, +, ×. Meaning that the frame resembles the shape of the letters and the rotors are situated at the extremities. Due to the requirement of mounting an exchangeable standalone navigation stack within the frame center, only shapes without edges passing through the center were considered, that is, △ and □. Comparing △ and □, the triangular arrangement gives the largest motor separation distance, and therefore, also the widest unobstructed camera FOV.
In favor of more stable propulsion, a coaxial motor arrangement with six motors in two planes was chosen, as shown in Figure 2a. This symmetric configuration of an equal number of counter-rotating motors on each plane takes care of the angular momentum balancing, which is common for multirotor system designs.
The propulsion system is mounted onto the frame and comprises electronic speed controllers (ESCs), motors, propellers and cabling.
Landing gear and propeller guards (not shown on pictures) are as well mounted directly to the frame. The propeller guards are designed to shield not only the propellers but the exposed navigation cameras as well. For high stiffness, popular carbon fiber tubes with a diameter of 18 mm and a thickness of 0.55 mm were selected for the frame itself, whereas the landing gear and the propeller guards are attached with 8 mm diameter tubes. All power and data wires going from the navigation stack interface to the individual ESCs are contained in the tube structure. The carbon fiber tubes of the frame and landing gear are assembled with custom aluminum connection parts which also serve as motor mounts. After assembly the carbon fiber and aluminum parts are glued together with epoxy. The mechanical decoupling between the frame and the navigation stack is achieved by rubber dampers on the fixture for the navigation stack.

| Navigation stack
The navigation stack is a self-contained, detachable unit holding all sensors, embedded computers, and miscellaneous electronic components with the exception of the ESCs. It is a stand-alone unit in the LUTZ ET AL.

| 519
sense that it only needs supply voltage as input and provides a bidirectional controller area network (CAN-bus) data interface for controlling any actuators, for example, the ESCs. The navigation stack can also be used independently of the frame, either carried or attached to other types of mobile robots, to test navigation algorithms. It is comprised of the following components: • Low-level real-time embedded computer: BeagleBone Black (BBB; singlecore 1 GHz CPU, 512 MB RAM) embedded single-board computer with a custom cape/breakout printed circuit board (PCB). It contains a watchdog safety circuit, power supplies for 3.3, 5, and 12 V and a buzzer.
• Analog devices ADIS16367 IMU. It consists of 3 DOF accelerometers and 3 DOF gyroscopes, which are factory-calibrated and temperature compensated.
• Four wide-angle cameras for the dual stereo setup, see Section 3.2.
• Xilinx Spartan 6 LX75T FPGA running the SGM stereo algorithm by Hirschmüller (2008). The BBB custom breakout PCB contains the following components: • Emergency power switch-off circuit, triggered by watchdog or explicit command. When turned on, the n-channel metal-oxidesemiconductor field-effect transistor (MOSFET) circuit has a 0.5 mΩ low impedance resistance for low heat dissipation.
• CAN-bus driver circuitry with switch-off from ESC data signals.
• Soft start with smooth current ramp up while power is switched on, to limit maximum current drawn by ESCs, which acts as high capacitive load.
• Trigger functionality for driving external illumination LEDs to aid vision in poorly lit conditions.

| Vision sensor setup
The stereo camera pairs are the primary sensors on-board ARDEA and their output is used for flight critical as well as higher-level to the horizon and those of the upper cameras at 60 + ∘ (see Figure 4).
As a result the complete stereo setup provides approximately 240 ∘ vertical field of view as illustrated in Figure 5.
In addition to the advantages this camera setup has in indoor scenarios, the arrangement of cameras is also well suited for the high dynamic range situation in outdoor scenes with often much higher brightness above the horizon than below. As separate cameras cover the FOV below and above the horizon, longer exposure times and higher gains can be used for the lower FOV to cope with the different intensities. The camera base unit triggers the cameras synchronously, captures all images, applies preprocessing, and sends them to the NUC high-level embedded computer via USB. This hardware trigger is also connected to the BBB computer, which saves the current timestamp along with the last IMU values for every trigger event and sends it via message to the local navigation filter.
As illustrated in Figure

| Propulsion system
Although the here presented conceptual robot is designed for future planetary missions, its propulsion system is layout to operate in Earth atmosphere to test the autonomy software and overall system. As previously introduced in Section 3.1.3, we chose a symmetrical, coaxial tricopter layout with two times three motors in a coplanar arrangement to balance the resultant angular momentum of the

| Aerodynamic design considerations
By applying the thrust equation in Equation (1)

| Electrical and communication design
As previously mentioned our supply voltage is provided by a 4-cell Li-Po accumulator. To incorporate the rotor angular speeds into the feedback loop of the attitude controller an ESC which is providing those values over a telemetry link is required. There are two common ESCs interfaces: 1. Point-to-point serial interface, for example, RS232, 1-wire.
2. Full-or half-duplex data bus realizations such as RS485 or CAN-bus.
Moreover, the latter communication approach replaces the common PWM servo position or the newer oneshot, multishot, or DShot protocols which are used to send speed commands to ESCs.
Motor controllers like modern KISS ESCs provide a telemetry link via a 1-wire bus, that means for n ESCs one needs n wires for sending commands, one for receiving telemetry over 1-wire and one for the signal reference (GND), resulting in eight wires for the hexacopter platform design which is discussed here. This communication scheme is not optimal in the way it uses an excess of cables, which adds design complexity and additional weight to the system. By comparison, bidirectional data buses like RS485 or CAN-bus are based on a physical layer using differential signals, and therefore, do not need a common reference voltage (GND). They can be implemented in a half-duplex fashion, requiring only two cables for sending motor commands and receiving telemetry over the same wires. Unlike RS485, the CAN-bus standard not only specifies the physical layer but also the data link layer (according to the ISO/OSI model), which has many favorable properties, such as: • Multimaster bus: Reduces need for time-consuming polling in a master-slave system. Telemetry messages are sent on the bus without having cyclic request messages.
• Multicast reception with time synchronization: Synchronous sending of motor commands to all ESCs in one message reduces overhead.
• Error detection and signaling: A faulty bus state can be detected and recovered automatically.
• Bus access collision avoidance through prioritization of messages: Simplifies communication protocol design by not having to take care of it.
Those properties simplify the communication design complexity and therefore, give CAN-bus a clear advantage for our desired platform design. Galvanic isolation to alleviate the maximum common mode voltage range limit of CAN-bus transceivers is not necessary because the range of our selected CAN-bus transceiver ( 7 − to +12 V) proved to be sufficient for stable communication even in noisy conditions caused by fast switching of currents up to 10 A within the ESCs.
Although star topologies are not recommended due to causing signal reflections, they offer a more flexible way of routing the data bus cables across the platform frame and they are unproblematic, if the cable lengths are short enough. Our design is according to the high-speed ISO 11898 standard specifications, which recommend a maximum unterminated stub length of 0.3 m with a 1 Mbit/s data rate. Figure 8 shows the bus topology and the bus termination strategy with only one resistor (R 120 L = Ω) on the BBB custom PCB (see Section 3.1.4). This is not according to the standard but neglecting reflections due to stub lines in a star topology is a convenient design simplification for the short wires in our design. Only one termination resistor is used to match the bus on the PCB to the twisted-pair cable impedance and to attenuate present reflections.

| System software setup
To enable true autonomy our MAV has to run all processing on-board, this includes sensor processing, mapping and planning. Similarly to Schmid, Lutz, et al. (2014) we are separating autonomy functionality into low-level, real-time (RT) and high-level, computationally intensive and high latency tasks without hard real-time constraints, which is discussed in more detail in Section 4. RT critical software modules are the attitude-and position-controller, and the respective sensor driver modules responsible for IMU readout. Because sensor data is acquired on both the BBB and NUC computer, their system clocks have to be time synchronized for consistent data association in the local reference filter, which is done in the local navigation filter (see Section 4.2.2). The precision time protocol daemon (PTPd) is used for fast and accurate system clock synchronization, it is started with a rate of 4 Hz, allowing time stepping upon startup and fast clock slewing during system initialization and achieves a clock synchronization accuracy of below 1 ms within half a minute. Moreover, camera trigger timestamps are captured on the BBB computer (see Figure 3) for accurate state vector augmentations in the local navigation filter.

| AUTONOMY SOFTWARE
Our autonomy software is separated into low-level and high-level components. The former is comprised of elementary functionality such as system stabilization in the attitude and position controller for manual piloting. Those algorithms have to run on a real-time OS with a high priority to satisfy the tight task scheduling requirements of the control loop, running at 500 Hz. Because large deviations of the task scheduling latency lead to system instability these software parts can not run on a remote PC and have to run on-board the system.
The latter software type deals with higher level tasks such as navigation and mapping. These algorithms are computationally in-

| Low-level autonomy software
In this section, we describe the low level autonomy software on ARDEA.
All of the here described functionalities are running on-board the MAV with real-time processing constraints. First the basic control system is described, following up with the external wrench estimation, which makes the vehicle observe forces and torques caused by contacts and wind influence for improving robustness. The latter is important for missions performed on planetary bodies with atmosphere.

| Controller
As any multicopter with parallel thrust vectors, our ARDEA hexacopter is underactuated, that is, it has four control inputs (collective thrust and three torques) but six DOF. However, the system is differentially flat, meaning that the control input can be computed from | 525

Dynamics model
The rigid-body model of ARDEA used for control is given by wellknown Newton Euler equations (Tomić, Ott, & Haddadin, 2017) as is the position in the inertial frame, ω is the angular velocity of the body w.r.t. the inertial frame, R is a rotation matrix,

Control allocation
The control allocation maps the computed thrust T from the position controller as well as the computed torques τ from the attitude controller to rotor speeds i ϖ of the six propellers, i 1, , 6 ∈ [ … ], such that where ρ is the air density, W is a diagonal matrix, where the elements Note that for convenience, the parameters rotor radius and rotor disk area may be lumped in the thrust and torque coefficients c c , u l , k u , k l of upper and lower propeller, respectively. The coefficients of the propellers used on ARDEA were identified based on force-torque sensor experiments (Tomić, Schmid, Lutz, Mathers, & Haddadin, 2016). For the quadratic model Equation (5), it was found that upper and lower propellers in the coaxial configuration have different coefficients, which is due to the fact that the lower propeller operates in the downstream of the upper propeller. Our control allocation is implemented generically for different multirotor configurations, for example, quadrocopters, hexacopters, or octacopters. Hence, although an analytical solution B # exists, we compute it numerically once during startup and as soon as reallocation is required, for example, if a motor fails or a propeller is lost. A single rotor is removed from the control allocation by deleting the respective column in B and the numerical pseudoinverse is obtained using singular value decomposition (SVD). Detection of motor failures on ARDEA is possible because of motor telemetry transmitted via CAN-bus (see Section 3.3.2).
This increases fail-safe robustness (Michieletto et al., 2018) in the sense that ARDEA can maintain stable flight with less than six propellers, provided that the cumulative thrust of the remaining motors is sufficient for the actual takeoff weight. However, an equilibrium of rotor forces and moments is only obtained for a set of four remaining rotors that satisfies BB I = # . Otherwise the vehicle will end up rotating at a defined rate (M.

Parameter identification
Identification of the rigid body parameters is based on the linear  [upper, lower]. The inertia and center of gravity position were identified using data from an identification flight from Tomić et al. (2017) and identified propulsion parameters.
of the coaxial propellers (four parameters), propeller inertia (one parameter), and the motor torque coefficients (two parameters), resulting in a total of 15 parameters. The parameter estimation is performed in three steps as shown in Figure 13, to reduce the search space. Parameters are identified by minimizing the 1 ℓ norm of the model residuals using the iteratively reweighted least squares (IRLS) method (Chartrand & Yin, 2008), which is robust against outliers.
Identifying all parameters from flight data is difficult due to the lack of ground truth measurements of the total torque acting on the vehicle. In addition to that, we found that identifying the thrust and torque coefficients from flight data is sensitive to time delay in the measurements (on the order of 20 ms), and can lead to physically meaningless parameters, like negative thrust coefficients. Hence, we split the identification into three parts as depicted in Figure 13. The propulsion model serves as ground truth for the rigid body identification. In the last step, the external wrench is identified based on the previously identified parameters. During our experiments, only the aerodynamic wrench acts on the robot, hence, we use the estimated external wrench to identify the aerodynamic model.

Propulsion system parameters
The propulsion parameters of ARDEA are stacked in the vector and the regression matrix Y 1 contains the rotor rates and motor current. The motor torque coefficients upper and lower motors (K q u , and K q l , , respectively) because of the aerodynamic interaction between the propellers in the coaxial configuration. For this step, we fixed the hexacopter to an ATI 85 Mini force-torque sensor as depicted in Figures 12 and 13. The wrench measured by the sensor is denoted as u 1 . We logged the pose provided by a motion capture system at 250 Hz, IMU data, motor speed, and current as measured by the speed controllers, the commanded control input, and the measured force and torque. The onboard attitude controller ran at 500 Hz. We calibrated the relative orientation of the force-torque sensor to the IMU beforehand. The resulting parameter estimates are listed in Table 1. Figure 14 shows a comparison of the identified model to the force-torque sensor measurements. It can be seen that the identified propulsion model closely matches the force-torque sensor measurements. Here, using the measured motor speeds to obtain the control wrench only results in a minor improvement compared to using the commanded speeds.
For the yaw torque, we consider the estimated rotor acceleration π to account for fast transitions and, therefore, improve the accuracy of the estimate (cf. Figure 15a). Note that adding the measured motor current does not increase the accuracy over using only the rotor acceleration. However, in the case of actuator failure (e.g., partially losing a propeller), the motor current will provide a better estimate of the yaw torque, as the method does not explicitly consider the propeller drag torque. Figure 15a depicts a comparison of the estimation using different measurements as well as the motor current.

Rigid body parameters
Considering a diagonal inertia tensor, the vector of rigid body parameters is given by The input u 2 is obtained from the identified propulsion model and the known mass m such that where y m is the regression matrix column associated with the mass. Table 2 lists the identified parameters and the propulsion model torque is compared to the torque predicted by the identified rigid body model in Figure 14. We find that the 1 ℓ -identified parameters match closely to the least squares 2 ℓ parameters, which confirms the correctness of the identified dynamics model. Identification of the aerodynamic model was done through wind tunnel experiments, this is beyond the scope of this paper but described in more detail in Tomić et al. (2018).

Attitude controller
For attitude control, we employ a model-based proportional derivative

Position controller
The position controller is also a model-based PD controller as in Tomić et al. (2017) It feed-forwards the desired linear acceleration p d of the reference trajectory, which is generated either by the naive polynomial F I G U R E 1 2 FTS setup for identification of propulsion parameters (Tomić et al., 2017(Tomić et al., , 2018 F I G U R E 1 3 Parameter identification procedure according to Tomić et al. (2018). The procedure is done in three steps to minimize coupling effects in the high-dimensional parameter space F I G U R E 1 5 Validation of the dynamics model. The yaw torque can be estimated using rotor acceleration π and the identified rotor inertia J r , as well as the current i a as measured by the ESC, and the identified motor torque coefficient k T . Rigid body parameters are identified using data from an indoor flight at low airspeeds but high accelerations to excite the dynamics parameters. Ideally, the rigid body forces and torques should match the now identified and known propulsion inputs, which follow the near-hover model due to low airspeeds. The right plot compares the commanded propeller torques to the rigid body using the identified inertia and center of mass.
Based on the identified system parameters described above, the controller gains are derived from desired poles (in the negative complex half plane) of the closed-loop system. The chosen gains can be found in Table 3.

| External wrench estimation
To

| Air density estimation
The thrust of a multirotor depends linearly on the air density ρ (see Equations (1) and (5)). A common assumption is that the air density is constant and known, and therefore, may be lumped in the rotor thrust and torque coefficients (Mahony et al., 2012). However, the air density changes depending on weather and altitude, that is, the atmospheric conditions pressure, temperature, and humidity. This can lead to a difference in thrust of more then 10%, which has to be compensated by the flight controller. Integrating a sensor measurement provides no guarantee for convergence of the thrust and in- We consider the air density m ρ used for control allocation to be the real, a priori unknown, air density ρ subject to a multiplicative uncertainty 1 ε ( + ), such that 1 m ρ ε ρ = ( + ) . Inserting in the translational dynamics, Equation (2) yields also the expression m m 1 ε = ( + ) , which may be interpreted as an effective mass. The estimator for ε is basically an adaptation law defined as where 0 γ > is a design parameter, 0 λ > is the controller gain, and z z , d with z z z d = − are the measured and the desired height, respectively. If the estimator is activated, (14) is integrated w.r.t. time and ε is used in the augmented position controller (see (12)) Finally, air density ρ or effective mass m are estimated via   In this section, we describe the high level autonomy software which is running on-board the MAV. We describe the local state estimation of the vehicle, consisting of a multi-VO framework and a loosely coupled filter fusing the data of an IMU and the VO output. We then describe our framework for global 6D localization and dense 3D mapping that build on top of the local estimation. Finally, we present the motion planner of ARDEA which is using the processed point clouds for planning feasible and obstacle-free trajectories.

| Visual odometry
The VO is a crucial part for the state estimation and therefore, for the robustness of the system. Its task is to give an estimate of the camera motion based on the perceived images. Without pose estimates from the VO, the local navigation filter only integrates IMU measurements and therefore, would diverge within seconds, which makes the use of the MAV for autonomous tasks impossible. The presented VO estimates the relative transformation from one camera frame to another taken at different timestamps. The algorithm and setup is based on Hirschmüller, Innocent, and Garibaldi (2002) and Stelzer, Hirschmüller, and Görner (2012), where the reader is referred to for in-depth details. We assume that the scene is mainly static, which is a common assumption and also valid for most planetary exploration scenarios. Furthermore, we do not constrain the camera motions, it thus can be arbitrary. In contrast to other approaches that use motion priors or kinematic constraints derived for specific vehicles, we want our method to be as general as possible to be able to apply it on any robotic platform. We therefore, do not make any assumptions about the kinematics of the robot in the VO. Although three noncollinear 3D feature points are sufficient to calculate the translation and rotation of a camera's relative movement, it is advantageous to have more feature points to reduce the effect of noise, to thereby increase the estimation accuracy and to the improve rejection of outliers. After feature extraction, we search for correspondences between the current and previous image. Before a feature can be used for motion estimation, it has to pass two additional outlier rejection steps. Assuming the scene is static, one can expect that the relative distance d sr of two points s The following outlier rejection step is based on an upper limit for the Equation (17) uses a spherical error model, which is a rough approximation but has the benefit that the transformation can be calculated in closed form. After R and t are estimated, Chauvenet's criterion (Taylor, 1982) is applied to remove further likely false correspondences. Finally, the values for translation and rotation are optimized using an ellipsoid error model as described by Matthies and Shafer (Matthies et al., 2014). This is done by an initial guess derived from the previous spherical error model and with the reduced set of consistent correspondences. (2018)

| Local navigation filter
As outlined in Section 4.2.1, for each remapped virtual pinhole stereo camera an independent VO estimate is calculated. They are combined with acceleration and angular rate readings from an IMU.
is the position of the body frame (b-frame) relative to an earth-fixed, inertial frame (n-frame), v Wide angle images are captured.
Those images are remapped into eight pinhole images.
All left and right images are grouped into one combined left and right image.
Left and right images are sent to FPGA for stereo processing, resulting in a depth map. Each VO instance receives a pinhole image and the corresponding depth map [Color figure can be viewed at wileyonlinelibrary.com] F I G U R E 1 8 Keyframe handling of multiple VOs. Red dots illustrate keyframes, arcs indicate which reference frame the estimated, relative camera poses are expressed in. Samples without an arc indicate frames where pose estimation was not possible. Note that each VO selects different keyframes [Color figure can be viewed at wileyonlinelibrary.com] stability of the system. Therefore, they have to be compensated.
Hardware triggers are used to define the exact timestamp, when an image was exposed. Additionally, each time an image is triggered, the current state x k is extended by a substate x aug The  Figure 19 shows an overview of the filter design and the connection to the VOs and IMU.
The VO is the only sensor complementing the readings from the IMU. Therefore, outliers in the VO will directly have a negative im-

| Global 6D localization and 3D mapping
For global localization and mapping, we employ our 6D SLAM framework introduced in Schuster, Beetz (2015, 2018). It enables efficient online and on-board singleand multirobot global localization and mapping by building upon the estimates from the local navigation filter and complementing them with additional intra-and inter-robot loop closure constraints.
Combing local and global estimation methods allows us to get the best of both worlds: Fast local state estimates from the navigation filter that are required for stabilization and control of our highly dynamic robot as well as online global estimates required for consistent mapping, path planning, exploration as well as multirobot coordination.
To create dense 3D maps we employ a submapping technique. It allows us to efficiently handle the high-bandwidth depth data generated via the FPGA-based stereo matching on the images from ARDEA's wide-angle camera system presented in Section 3.2. As a first step, we aggregate the dense stereo data along the trajectory estimated by the local navigation filter (Section 4.2.2). As its estimates are locally stable but globally subject to drift, we partition the aggregated data into partial maps of limited size and uncertainty, so-called submaps. A submap, anchored by the gravity-aligned pose of its origin, contains two different, application-dependent representations for its 3D data that we visualized in Figure 20: Colored point clouds at a resolution of 5 cm and probabilistic voxel space at a resolution of 10 cm. Point clouds are fast to aggregate and constitute a suitable 3D model for visualization of the environment and, in the future, can serve as input for semantic segmentations and geometrybased map matching methods . We employ the freely available OctoMap library for a memory-efficient representation of a 3D voxel space (Hornung et al., 2013). The probabilistic aggregation of data from multiple measurements is computationally more expensive, hence the lower resolution compared to the pointcloud representation, however, it allows to deal with sensor noise and changing parts in the environment. Furthermore, this representation explicitly distinguishes F I G U R E 1 9 Local navigation filter design. The direct system state x is calculated at a high rate by the SDA using acceleration and gyroscope measurements (a, ω) coming from the IMU. Relative and time delayed pose measurements ( p q , δ δ ) are used in the EKF to calculate the state errors x δ at a lower rate and are immediately used for state correction [Color figure can be viewed at wileyonlinelibrary.com] between unknown, occupied, and free space, information that is crucial for obstacle avoidance, path, and exploration planning algorithms. In the experiments presented in Section 6.1, new submaps were triggered whenever within a submap, the standard deviation of the robot's position as estimated by the local navigation filter exceeded 0.1 m or the accumulated traveled distance was above 2.0 m. These thresholds limit the errors within the individual submaps caused by filter drift and restrict their size to limit their memory and processing time requirements in postprocessing steps and multirobot data exchange. Whenever a new submap is triggered, we switch the frame of reference of the local navigation filter (Section 4.2.2), which is implemented as a local reference filter (Schmid, Ruess, et al., 2014), into the gravity-aligned submap origin. This frame switching allows to maintain long-term consistency and numerical stability within the filter as well as a more accurate integration of the filter's estimates into the overlying SLAM graph (Schmid, Ruess, et al., 2014;Schuster et al., 2015). library (Dellaert, 2015).
Whenever new submaps are created, we add their origins as nodes to the graph and connect them via the filter estimates for its respective switch of reference frame. In Figure 21  it is straightforward to include the nodes and edges from other robots to create a joint graph for multirobot estimation. In Section 6.1 we present a demonstration of our multirobot mapping system in a heterogeneous team consisting of our aerial robot ARDEA and the planetary exploration rover LRU . Combining local navigation filters, one per robot, with pose graph optimization leads to a small and sparse graph, allowing fast incremental online optimization steps. The SLAM graph thereby is independent from high-frequency measurements and filter-internal states. This is particularly important for systems like ARDEA with its four key framebased visual odometries (Section 4.2.1). As all of their estimates are fused in the local navigation filter, the SLAM graph does neither increase in size nor complexity by adding further high-frequency measurements or estimates like, in this case, additional visual odometries. Furthermore, in multirobot systems, high-frequency measurements and high-bandwidth depth data are processed locally on each robot in a distributed fashion and then only transferred and combined in their aggregated and compacted forms.
We periodically compose global dense 3D maps from the submaps, as shown exemplarily in Figure 20 for single and in Section 6.1 for multirobot experiments. This is done by arranging them according to the latest graph SLAM estimates for their origins and merging their 3D representations.

Problem formulation
We formulate trajectory generation as a nonlinear program to minimize a cost Γ which is a function of some free parameters p describing the path of the system T p t , ( ) and its time derivatives (a trajectory). This is an inverse dynamics-based approach where the states along a trajectory are independently parameterized and the actuations to achieve them arise from the system equations of motion shown in Equation (2) The rotational part is expressed as an angle in radians and rotations in SO (3) Kraft, 1988) implementation from the open source NLopt library (Johnson, 2008). The algorithm is gradient-based and the numerical gradients for Γ and c ineq are solved by perturbing the parameters p i η + and computing the forward finite differences.

Cost metric
For the optimization objective Γ we make use of a weighted, squared sum of the linear and angular accelerations along the discrete trajectory, p w r r . for planetary exploration would require maximal robustness and image acquisition quality, therefore, precluding the need for aggressive flight. Sensor-based quantities such as cinematic effect (Nägeli et al., 2017) or mapping of a point of interest (Yoder & Scherer, 2016) are also of increasing interest. While such planning metrics could be useful for optimal data gathering, they are out of the scope this study.

Inverse dynamics procedure
The actuation constraints c act are realized as upper and lower bounds on the required motor speed to achieve the flat outputs as opposed to simple limits on the system acceleration. While this approach is more computationally intensive, it ensures that the planned trajectories are feasible w.r.t. the system dynamics as described in

Collision constraints
The trajectories are constrained by the system dynamics and camera properties, as well as that they should be collision free. To ensure that a path traverses only free space, the planning software receives as input a global probabilistic occupancy grid represented as an Octomap (Hornung et al., 2013). Given a trajectory T, at each discrete via point t the Octomap is queried for the occupancy probability at t r( ).

Random search
While the gradient-based, nonlinear optimization is a powerful tool for finding smooth, feasible, and locally optimal solutions given the constraints and cost metric, it is highly dependent on an initial guess and susceptible to existing local minima. Similar to the combinatorial method described in Stoneman and Lampariello (2016) we run an initial, RRT-like coarse search to seed the optimization with a feasible initial guess. The coarse search is a modified RRT algorithm which samples position and velocity of the flat outputs from a uniform distribution. The edges are composed using the point-to-point spline method as described above, where the accelerations of each DOF are minimized using a bounded, Eigen-based quadratic program (Guennebaud, 2017;Guennebaud & Jacob, 2010).

Trajectories and software integration
A schematic of the planner software inputs and outputs is shown in Figure 22. The motion planner runs continuously and actions can be triggered via acyclic service requests. A planning action receives as input a desired waypoint state(s) and generates trajectories from the robot's estimated current state to the waypoints(s). When a feasible solution is found the resulting spline vertices and motion duration are sent to the controller which then queries the reference each tick.
F I G U R E 2 2 Schematic of the planning software integration. The planner process runs continuously in an idle state, generating only when a new waypoint is received via acyclic trigger 6 Jerk and snap are the third and fourth time derivatives of position, respectively.

|
An example trajectory from one stationary waypoint to another is shown in Figure 23. The result of the coarse search which is locally optimal and satisfies the motion constraints at each edge is provided to the optimization algorithm as an initial guess. These typically jerky, piece-wise trajectories are then smoothed according to the cost metric.

| SKILLS
Skills are a well-known concept in robotics. Many different definitions of the term "skill" exist in the literature. Some focus on the intellectual capability of problem solving (Sussman, 1973), some on the physical abilities of the motor system (Peters, Kober, Mülling, Nguyen-Tuong, & Kroemer, 2012), and some on all functionalities that may be implemented in a state machine (Steinmetz & Weitschat, 2016).
We define skills as modular sets of perceptual, computational, and dynamical capabilities that the MAV possesses. Skills can be computational operations or actions that the vehicle executes. They may have a number of static parameters or dynamic variables from input data (e.g., a measurement or state estimate). The outcomes of the skills can be the result of a computation, an action, or both. We follow the definition from (Ogasawara, Kitagaki, Suehiro, Hasegawa, & Takase, 1993) where skills are defined as primitives which execute a combination of functions. We adopt this scheme by allowing a skill to have only numeric outputs instead of actions.
The introduction of skills enables an easy access to basic and more complex capabilities of the MAV. The set of available skills allows an operator to define how a task should be solved in a structured way. For clarity, we divide a mission into tasks, which may use a defined set of skills, see Figure 24. This increases modularity, allows reactive changes of the task sequence,and easy definition of new missions, which can inherit functionalities (in the form of tasks and skills) from existing missions. Another advantage of the skill concept is that its level of abstraction can be represented well by a state-of-the-art state machine, such as RAFCON (Brunner, Steinmetz, Belder, & Dömel, 2015).
In the following, we describe the most important skills of ARDEA. We first treat basic skills inherent to any modern MAV. Then, we introduce more advanced skills, which lead to a higher level of autonomy, required for future a planetary mission. Therefore, it is important to notify the state estimation module after a successful landing. Skipping this step, could result in a nonpredicable behavior of the MAV, like takeoff into a random direction. In addition, noisy measurements might be accumulated in the map due to a drifting pose estimate. Similar to the fly-towaypoint skill, the landing skill takes the desired landing waypoint and velocity as input parameters.

| Air density and payload estimation skill
An estimate of the air density is important especially for planetary missions. It can not only be used for increased flight performance, but also for meteorological and scientific measurements. The air density estimation skill implements the method presented in detail in Section 4.1.3. It can be activated via a service call-during takeoff to estimate the air density, or if a payload is collected or dropped-and deactivated afterward to avoid adaptation to external disturbances.

| Wind estimation
As mentioned in Section 4.1.2, external disturbances such as wind can be modeled in the control software and estimated. This can help to improve stable flight in missions where strong disturbances are expected or enable MAVs to act as a flying wind sensor to characterize complex airflow scenarios, for example, next to wind parks.
Sensors which measure wind velocities, such as pitot tubes or anemometers have the drawback that the complex airflow around the MAV caused by the propulsion system directly influences the sensor readings and thus can render them useless (Tomić et al., 2016). Isolating the wind sensor from the propulsion airflow makes the MAV design and sensor placement a challenging task. We estimate this quantity by using the aerodynamics properties of the propellers and the motor current measured by the ESCs. This not only simplifies the system design by removing the need for an additional sensor, but also mitigates this issue.

| Modular VO setup skill
Each VO can be activated and deactivated on-the-fly and independently of the state estimation module. As such it only takes a VO ID and the respective on/off status flag as parameters. If this skill stops a VO, the local navigation filter does not receive any information of this particular VO instance anymore. Being able to switch on and off sensors in the state estimation is one advantage of a loosely coupled filter approach. In case the filter receives no VO measurements at all, it will propagate its current state using the IMU data. In general, this should be avoided, at least for longer periods of time, since the filter will diverge. However, being able to switch off VO(s) can be of great benefit in specific mission en- The waypoint planner maintains a graph of waypoints in ARDEA's global map frame and can traverse its graph to find optimal sequences from one waypoint to another. The waypoint planning skill provides the input to either the trajectory planner in Section 4.2.4 or the polynomial interpolator, Section 5.2, as depicted in Figure 9. It can perform three different actions when triggered, construct a graph from a list or from waypoints defined dynamically for example, from an exploration task, plan a sequence through the graph from a start to a goal waypoint, or send the next waypoint in a planned sequence. The graph allows each node to have multiple child waypoints, but just one parent waypoint. Optimal sequences of waypoints are calculated using the well-known Dijkstra algorithm (Dijkstra, 1959). It can also be useful for estimating the current height of the vehicle.

| Depth estimation skill
To do this, the reference frame of the polygon should be the IMU frame, to move with the vehicle. The polygon can then be defined as an area below the MAV. Since the polygon is defined in 3D space, the height of the vehicle should be roughly known beforehand. This can be done with height estimations, which where done before or by averaging all depth values in the depth image of the down looking camera to get a first estimate. The measured depth d pc is based on a pinhole camera model. Therefore, the measured values are the distance projected on the principle axis of the camera. To measure the distance between camera and 3D point, the angle ϕ between principle axis and ray to 3D point has to be taken into account. The angle is calculated by Points, which do not have a valid depth estimate are discarded. This might be, because no depth was measured or the point was too close or far away and, thus, discarded for accuracy reasons. The maximum and minimum values for that are parameters of the skill.

| Exploration turn skill
ARDEA can perform a simple exploration movement to map its environment. Because of the large vertical FOV, rotating around the yaw-axis makes mapping of the nearby surroundings very efficient, rotating roughly 280 ∘ is enough to map the whole surrounding of the MAV. In contrast to the Fly-to-waypoint skill from Section 5.2, the skill is performed on the same spot, but allows rotations greater than 180 ∘ , otherwise it shares the same functionality and parameters. A simple waypoint flight cannot provide such a movement, since it always calculates the shortest path between two waypoints, which results in the smallest relative rotation angle that can never be larger than 180 ∘ . To monitor whether the desired rotation has been executed, the state machine cannot simply check the current orientation of the MAV, but has to integrate the performed rotation over time.
The parameters of the skill are the amount of rotation and the desired rotational velocity. This skill is usually executed with low angular velocity to obtain a high quality map. With this skill the task of mapping an unknown environment can be solved elegantly and timeefficiently.

| FIELD EXPERIMENTS
In this section, we demonstrate the capabilities of our MAV ARDEA in field experiments. We show the robustness of the system and apply the presented skill approach to real missions. The modularity of the latter is shown to be useful for defining, monitoring, and performing complex missions.
In the first experiment we illustrate a live demonstration of our flying system on the International Astronautical Congress (IAC). Over five consecutive days, ARDEA performed more than 40 autonomous flights in front of a public audience and experts of the field. In the second experiment, we demonstrate the flight and navigation capabilities of our system on Mount Etna in Italy, which provides a similar textural and geological environment as our moon.

| IAC mission setup
We performed a public demonstration of our system at the IAC 2018.
The IAC is the world's largest annual gathering of space professionals with more than 6,500 participants F I G U R E 2 6 Impressions of our cooperative multirobot mapping demonstration at the IAC 2018 with a heterogeneous team of rover and hexacopter: ARDEA observing LRU from its first waypoint after takeoff from the rover's transport platform (top left), ARDEA on its return flight while LRU is navigating to the location explored by ARDEA. One laser terminal is visible on the right side (top right), and ARDEA landed next to its start position while LRU reached its target destination ( Figure 29 illustrates the 3D voxel map and pointcloud jointly created by both systems in a time series and Figure 30 shows the final jointly crated RGB pointcloud. The upper image of Figure 31 shows the black and white converted image of the left pinhole cameras. Notice that we are just However, depending on the flight trajectory, it sometimes also captures the surrounding at higher elevations. Although this far distance camera view is not useful for ego estimation in the VO, it can still be used for other mission-relevant tasks. The VO of pinhole camera 1 covers a good range of far distance and close-up features, which makes it robust against fast movements and still accurate during slow maneuvers. Therefore VO-1 is used as main VO, but depending on the situation, the VO setup can be altered.

| ROBEX mission
The lower image of Figure 31 illustrates the corresponding depth image. Note that the depth is dense in the fore-and mid-ground. The depth for the box, far away hills, and the sky is invalid. The box is too close to the camera and the hills are too far away to get a sufficiently good depth estimate. The sky, on the other hand, does not provide any useful gradients and as a result no good features. However, even if clouds were present, they would be too far away to perform a valid depth estimate.
The upper left image of Figure    function as a mechanical low-pass filter for vibration damping, see design concept in Section 3.1.3. In the past, we had seen unstable control responses due to positive feedback loops caused by vibrations while we were not using vibration damping. We also learned that it is valuable that the navigation stack can easily be attached and detached without decalibrating the stereo camera configuration or changing the camera-to-IMU transformations.
Finally, the navigation stack can be attached to other mobile robots or used as a hand-held device, which makes it a reusable selfcontained unit for camera-based navigation and mapping, requiring only a power supply as input and providing a CAN-bus interface to control actuators.
Using a wide-angle FOV stereo camera system has shown to increase the level of robustness in two ways. First, it improves the VO ego-motion estimation (Section 4.2.1) by being able to use structures on the ground as well as the ceiling in indoor scenarios.
Second, it detects obstacles in a wide range, which makes it useful for selecting safe landing sites and allows the trajectory planner a greater movement planning space in which obstacles are observable. On our previous system (Schmid, Lutz, et al., 2014), we did not have this wide FOV and once encountered a crash due to a structure hanging down from the ceiling as it was not observable by the narrow FOV stereo cameras and hence not considered as an obstacle by the path planner. A wide FOV also makes mapping more efficient: With our system, a turn of less than 300 ∘ is enough to get a 3D map of the whole surrounding environment. However, using four cameras with a large FOV comes with the price of a difficult camera calibration procedure. For calibration of intrinsic and extrinsic parameters, we are using a classical checkerboard. While taking calibration images, it is important that the pattern frequently appears in the lower and upper cameras. To get a good extrinsic calibration, images of the pattern should also be taken in the complete camera field of view which is 240 ∘ vertically and 80 ∘ horizontally, this is not always easy to accomplish and requires a big checkerboard.

| Autonomy software architecture
Due to the fact that MAVs are inherently unstable, it is essential that the controller software runs robustly. One way is to outsource this task to a dedicated microcontroller in contrast to a computer with an operating system that is running several tasks in parallel. This however comes with the disadvantage of high debugging complexity, heterogeneous software development tools and APIs. On the proposed flying platform we followed the established approach of separating the autonomy software into high level and low level RT tasks and distributing them on different computers so that they can influence each other only by predefined middleware interfaces. This separation improves the robustness of the system because high level tasks, such as navigation and mapping, require a high data throughput and processing time do not interfere with low level high frequency tasks, such as the attitude and position controller. Moreover, dynamic memory (re)allocations common in computer vision software might interfere with the real-time scheduling and can even fail due to running out of memory. In our approach however, unlike other microcontroller driven platforms such as PX4 8 , we use a RT Linux OS for deployment of attitude and position controllers and proved stable operation across both presented field experiments, see Section 6.
Using visual-inertial state estimation with multiple odometries has proven to be helpful in the IAC mission, see Section 6.1. With this setup it was easy to tailor the navigation system to a specific | 547 development phase we could quickly locate bugs and problems in software modules using this approach.

| Future work
To be able to use ARDEA for future planetary missions, open challenges regarding hardware and autonomy software still have to be solved. This section summarizes work in progress and visionary ideas for accomplishing such a mission.

| Space qualified hardware
A major step for raising the technology readiness level (TRL) towards a final space design is re-designing the system with space qualified hardware components which can handle high radiation and temperature levels. Another problem is the rather low computational power of current state-of-the-art space qualified computers compared to consumer grade computers. Using only cameras as main exteroceptive sensor has the advantage of straightforward space qualification, unlike current mechanically driven LIDARs. However, solid-state LIDARs without moving parts seem to be a promising alternative in the future. Although increasing the TRL is an important step towards a real space mission, we have been mainly focusing on design concepts and algorithms using terrestrial hardware. This allows for fast development iterations and proof of research concepts.

| Skills based on semantic environment interpretation
Because a high level of autonomy can make planetary exploration missions safer and more efficient, we are planning to add skills based on semantic interpretation of camera images to our system. This can be used for terrain classification or the detection of known and unknown structures, that means it can help to autonomously identify scientifically interesting geological features such as soils or rock without constant supervision by planetary science researchers. Due to long round-trip times or even offline phases in some missions, constant supervision is not even possible. Those algorithms are commonly referred to as novelty detection (Pimentel, Clifton, Clifton, & Tarassenko, 2014) within the statistics and machine learning community. Moreover, semantic image segmentation can also help to improve the accuracy of the VO and mapping algorithms in scenarios with dynamic environments, for example by distinguishing between static and dynamic objects and therefore disregarding dynamic objects for ego-motion estimation.
Since the MAV is limited in computational power mainly due to weight constraints, it opens up challenging research questions on how to obtain and process semantically labeled data. During a terrain classification scenario, images are acquired and a novelty detection algorithm detects and clusters geological features. In an offline phase, the classification results can be assessed by the operation researchers and custom labels assigned to interesting features. To robustly recognize the labeled features in new data, an incremental learning approach (Bruzzone & Prieto, 1999) can be used. Because the on-board computers are running near their load limit during flight, training phases can only be done while the MAV is not flying. In missions without the option to outsource offline computations to other robots or the lander, dedicated sleep phases of flight inactivity might be a solution to realize the learning steps. The inference step, however, has to be implemented efficiently enough to run online on the MAV during the mission.
Another important new functionality would be the skill to land on complex terrain like boulders and structured objects like another robot. Semantic interpretation of the ground can help to understand the difficulty to land on certain terrains and which landing strategy to use accordingly.
For the mapping framework we plan to match submaps created by ARDEA to gain further intra-and inter-robot loop closure constraints for relocalization based on the 3D geometry of the environment, indicated by the dashed line in Figure 21. We already implemented and demonstrated such a method for teams of planetary exploration rover prototypes in Schuster et al. (2018). In future work, we could also reduce this computational effort by restricting the compositions to application-dependent requests and limiting them to regions of interest. An example would be a path planning algorithm requesting the computation of a global map estimate of areas that are to be traversed next as an active navigation approach.
To be able to operate ARDEA even more intuitively in the future, it would be interesting to deploy innovative human machine interfaces. Since our MAV provides a wide FOV, methods and hardware developed for the VR and AR community might be suitable.

| Robot teams
Most MAVs still have limited computational power and flight time as well as a lack of manipulating capabilities and therefore can just be used for scouting purposes. To solve these problems a team of heterogeneous robots can be utilized to solve complex missions (Wedler et al., 2018). One team member might be used for scouting, like an MAV, whereas another member of the team is especially designed for manipulation tasks. The MAV, for instance, can search for interesting objects and communicate their positions to the other team member that can then pick them up for further analysis. A more computational powerful robotic team member can also help to tackle the problem of limited computational resources of MAVs by taking over noncritical computations, which might also result in increased flight time. Such an example could be the scientific evaluation of captured image data or the computationally intensive learning phase in the terrain classification/novelty detection use case. In all of these cases the here presented skill setup is of great help, since it can be used across robotic team members. Therefore, in the future we want to further focus on developing a team of robots, which can share skills and tasks.