Visual model‐predictive localization for computationally efficient autonomous racing of a 72‐g drone

Drone racing is becoming a popular e‐sport all over the world, and beating the best human drone race pilots has quickly become a new major challenge for artificial intelligence and robotics. In this paper, we propose a novel sensor fusion method called visual model‐predictive localization (VML). Within a small time window, VML approximates the error between the model prediction position and the visual measurements as a linear function. Once the parameters of the function are estimated by the RANSAC algorithm, this error model can be used to compensate the prediction in the future. In this way, outliers can be handled efficiently and the vision delay can also be compensated efficiently. Theoretical analysis and simulation results show the clear advantage compared with Kalman filtering when dealing with the occasional large outliers and vision delays that occur in fast drone racing. Flight tests are performed on a tiny racing quadrotor named “Trashcan,” which was equipped with a Jevois smart camera for a total of 72 g. An average speed of 2 m/s is achieved while the maximum speed is 2.6 m/s. To the best of our knowledge, this flying platform is currently the smallest autonomous racing drone in the world, while still being one of the fastest autonomous racing drones.

be relatively larger for them. Moreover, a cheap, light-weight solution to drone racing would allow many people to use autonomous drones for training their racing skills. When the autonomous racing drone becomes small enough, people may even practice with such drones in their own home.
Autonomous drone racing is indebted to earlier work on agile flight.
Initially, quadrotors made agile maneuvers with the help of external motion capture systems (Mellinger & Kumar, 2011;Mellinger, Michael, & Kumar, 2012). The most impressive feats involved passing at high speeds through gaps and circles. More recently, various researchers have focused on bringing the necessary state estimation for these maneuvers onboard. Loianno, Brunner, McGrath, and Kumar (2017) plan an optimal trajectory through a narrow gap with difficult angles while using visualinertial odometry (VIO) for navigation. The average maximum speed of their drone can achieve 4.5 m/s. However, the position of the gap is known accurately a priori, so no gap detection module is included in their research. Falanga, Mueggler, Faessler, and Scaramuzza (2017) have their research on flying a drone through a gap aggressively by detecting the gap with fully onboard resources. They fuse the pose estimation from the detected gap and onboard sensors to estimate the state. In their experiment, the platform with a forward-facing fish-eye camera can fly through the gap with 3 m/s. Sanket, Singh, Ganguly, Fermüller, and Aloimonos (2018) develop a solution for a drone to fly through arbitrarily shaped gaps without building an explicit three-dimensional model of a scene, using only a monocular camera.
Drone racing represents a larger, even more challenging problem than performing short agile flight maneuvers. The reasons for this are that (a) all sensing and computing has to happen on board, (b) passing one gate is not enough. Drone races can contain complex trajectories through many gates, requiring good estimation and (optimal) control also on the longer term, and (c) depending on the race, gate positions can change, other obstacles than gates can be present, and the environment is much less controlled than an indoor motion tracking arena.
One category of strategies for autonomous drone racing is to have an accurate map of the track, where the gates have to be in the same place. One of the participants of the IROS 2017 autonomous drone race, the Robotics and Perception Group, reached gate 8 in 35 s. In their approach, waypoints were set using the pre-defined map and VIO was used for navigation. A depth sensor was used for aligning the track reference system with the odometry reference system. NASA's JPL lab report in their research results that their drone can finish their race track in a similar amount of time as a professional pilot. In their research, a visual-inertial localization and mapping system is used for navigation and an aggressive trajectory connecting waypoints is generated to finish the track (Morrell et al., 2018). Gao et al. (2019) come up with a teachand-repeat solution for drone racing. In the teaching phase, the surrounding environment is reconstructed and a flight corridor is found.
Then, the trajectory can be optimized within the corridor and be tracked during the repeating phase. In their research, VIO is employed for pose estimation and the speed can reach 3 m/s. However, this approach is sensitive to changing environments. When the position of the gate is changed, the drone has to learn the environment again.
The other category of strategies for autonomous drone race employs coarser maps and is more oriented on gate detection. This category is more robust to displacements of gates. The winner of IROS 2016 autonomous drone race, Unmanned Systems Research Group, uses a stereo camera for detecting the gates (Jung, Cho, Lee, Lee, & Shim, 2018). When the gate is detected, a waypoint will be placed in the center of the gate and a velocity command is generated to steer the drone to be aligned with the gate. The winner of the IROS 2017 autonomous drone race, the INAOE team, uses metric monocular SLAM for navigation. In their approach, the relative waypoints are set and the detection of the gates is used to correct the drift of the drone (Moon et al., 2019). S. Li, Ozo, De Wagter, and de Croon (2018) combine gate detection with onboard IMU readings and a simplified drag model for navigation. With their approach, a Parrot Bebop 1 (420 g) can use its native onboard camera and processor to fly through 15 gates with 1.5 m/s along a narrow track in a basement full of exhibits. Kaufmann, Loquercio, et al. (2018) use a trained Convolutional Neural Network (CNN) to map the input images to the desired waypoint and the desired speed to approach it. With the generated waypoint, a trajectory through the gate can be determined and executed while VIO is F I G U R E 1 The IROS autonomous drone race track over the years 2016-2018 (a-c). The rules have always been the same. Flight is to be fully autonomous, so there can be no human intervention. The drone that passes through most subsequent gates in the track wins the race. When the number of passed gates is the same, or the track is fully completed, the fastest drone wins the race (a) IROS 2016 drone race track; (b) IROS 2017 drone race track; (c) IROS 2018 drone race track [Color figure can be viewed at wileyonlinelibrary.com] used for navigation. The winner of the IROS 2018 autonomous drone race, the Robotics and Perception Group, finished the track with 2 m/s (Kaufmann, Gehrig, et al., 2018). During the flight, the relative position of the gates and a corresponding uncertainty measure are predicted by a CNN. With the estimated position of the gate, the waypoints are generated, and a model-predictive controller (MPC) is used to control the drone to fly through the waypoints while VIO is used for navigation.
From the research mentioned above, it can be seen that many of the strategies for autonomous drone racing are based on generic, but computationally relatively expensive navigation methods such as VIO or SLAM. These methods require heavier and more expensive processors and sensors, which leads to heavier and more expensive drone platforms. Forgoing these methods could lead to a considerable gain in computational effort, but raises the challenge of still obtaining fast and robust flight.
In this paper, we present a solution to this challenge. In particular, we propose a visual model-predictive localization (VML) approach to autonomous drone racing. The approach does not use generic vision methods such as VIO and SLAM and is still robust to gate changes, while reaching speeds competitive to the currently fastest autonomous racing drones. The main idea is to rely as much as possible on a predictive model of the drone dynamics, while correcting the model and localizing the drone visually based on the detected gates and their supposed positions in the global map. To demonstrate the efficiency of our approach, we implement the proposed algorithms on a cheap, commercially available smart camera called "Jevois" and mount it on the "Trashcan" racing drone. The modified Trashcan weighs only 72 g and is able to fly the race track with high speed (up to 2.6 m/s) 1 . The vision-based navigation and high-level controller run on the Jevois camera while the low-level controller provided by the open source Paparazzi autopilot (Gati, 2013;Hattenberger, Bronz, & Gorraz, 2014) runs on the Trashcan. To the best of our knowledge, the presented drone is the smallest and one of the fastest autonomous racing drone in the world. Figure 2 shows the weight and the speed of our drone in comparison to the drones of the winners of the IROS autonomous drone races.

| Problem formulation
In this study, we will develop a hardware and a software system that the flying platform can fly through a drone race track fully autonomously with high speed using only onboard resources. The racing track setup can be changed and the system should be adaptive to this change autonomously.
For visual navigation, instead of using SLAM or VIO, we directly use a computationally efficient vision algorithm for the detection of the racing gate to provide the position information. However, implementing such a vision algorithm on low-grade vision and processing hardware results in low frequency, noisy detections with occasional outliers. Thus, a filter should be employed to still provide high frequency and accurate state estimation. In Section 3, we first briefly introduce the "Snake Gate Detection" method and a pose estimation method used to provide position measurements. Then, we propose and analyze the novel VML technique that estimates the drone's states within a time window. It fuses the low-frequency onboard gate detections and high-frequency onboard sensor readings to estimate the position and the velocity of the drone. The control strategy to steer the drone through the racing track is discussed. The simulation result in Section 4 shows the comparison between the proposed filter and the Kalman filter in different scenarios with outliers and delay. In Section 5, we will introduce the flying experiment of the drone flying through a racing track with gate displacement, different altitude and moving gate during the flight. In Section 6, the generalization and the limitation of the proposed method are discussed. Section 7 concludes the article.

| System overview
To illustrate the efficiency of our approach, we use a small racing drone called Trashcan (Figure 3). This racing drone is designed for FPV racing with the Betaflight flight controller software. In our case, to fly this Trashcan autonomously, we replaced Betaflight by the Paparazzi open source autopilot for its flexibility of adding custom code, stable communication with the ground for testing code and active maintenance from the research community. In this article, the Paparazzi software only aims to provide a low-level controller. The main loop frequency is 2 kHz. We employ a basic complementary filter for attitude estimation and the attitude control loop is a F I G U R E 2 The weight and the speed of the approach proposed in this article and the winners' of IROS autonomous drone race. All weights are either directly from the articles or estimated from online specs of the used processors [Color figure can be viewed at wileyonlinelibrary.com] cascade control including a rate loop and an attitude loop. For each loop, a P-controller is used. The details of Trashcan's hardware can be found in Table 1 For the high-level vision, flight planning and control tasks, we use a light-weight smart camera (17 g) called Jevois, which is equipped with a quad core ARM Cortex A7 processor and a dual core Mali-400 GPU. In our experiment, there are two threads running on the Jevois, one of which is for vision detection and the other one is for filtering and control (Figure 4a). In our case, the frequency of detecting gates ranges from 10 to 30 Hz and the frequency of filtering and control is set to 512 Hz. The Gate detection thread processes the images in sequence. When it detects the gate it will send a signal telling the other thread a gate is detected. The control and filtering thread keeps predicting the states and calculating control command in high frequency. It uses a novel filtering method, explained in Section 3, for estimating the state based on the IMU and the gate detections. In Figure 4b, the Gate detection and Pose estimation module first detects the gate and estimates the relative position between the drone and the gate. Next, the relative position will be sent to the Gate assignment module to be transferred to global position. With the global position measurements and the onboard AHRS reading, the proposed VML filter fuses them together to have accurate position and velocity estimation. Then, the Flight plan and high-level controller will calculate the desired attitude commands to steer the drone through the whole track. These attitude commands will be sent to the drone via MAVLink protocol. On the Trashcan drone, Paparazzi provides the low-level controller to stabilize the drone.

| ROBUST VML AND CONTROL
State estimation is an essential part of drones' autonomous navigation. For outdoor flight, fusing a GPS signal with onboard inertial sensors is a common way to estimate the pose of the drone (Santana, Brandao, & Sarcinelli-Filho, 2015). However, for indoor flight, a GPS signal is no longer available. Thus, off-board cameras (Lupashin et al., 2014), Ultra Wide Band Range beacons (Mueller, Hamer, & D'Andrea, 2015)  However, the racing scenario has properties that make it challenging for a Kalman filter. Position measurements from gate detections often are subject to outliers, have non-Gaussian noise, and can arrive at a low frequency. This makes the typical Kalman filter approach unsuitable because it is sensitive to outliers, is optimal only for Gaussian noise, and can converge slowly when few measurements arrive. In this section, we will propose a VML technique which is robust to lowfrequency measurements with significant numbers of outliers. Subsequently, we will also present the control strategy for the autonomous drone race.

| Gate assignment
In this article, we use the "snake gate detection" and pose estimation technique as in S. Li et al. (2018). The basic idea of snake gate detection is searching for continuing pixels with the target color to find the four corners of the gate. Subsequently, a perspective n-point (PnP) problem is solved, using the position of the four corners in the image plane, the camera's intrinsic parameters, and the attitude estimation to solve the relative position between the drone and the ith Figure 5 shows this procedure, which is explained more in detail in S. Li et al. (2018). In most cases, when the light is even and the camera's auto exposure works properly, the gate in the image is continuous and the Snake gate detection F I G U R E 5 The Snake gate detection method and pose estimation method (S. Li et al., 2018). (a) Snake gate detection. From one point on the gate P 0 , the Snake gate detection method first searches up and down, then left and right to find all the four corners of the gate. (b) When the four points of the gate are found, the relative position between the drone and the gate is calculated with the points' position, the camera's intrinsic parameters and the current attitude estimation [Color figure can be viewed at wileyonlinelibrary.com] positive detections, there is still a small chance that a false positive happens. The negative effect is that outliers may appear which leads to a challenge for the filter and the controller.
Since for any race a coarse map of the gates is given a priori (cf. Here, we assume that the position of the gate is fixed. Any error experienced in the observations is then assumed to be due to estimation drift on the part of the drone. Namely, without generic VIO, it is difficult to make the difference between drone drift and gate displacements. If the displacements of the gates are moderate, this approach will work: after passing a displaced gate, the drone will see the next gate, and correct its position again. We only need a very rough map with the supposed global positions of the gates ( Figure 6).
Gate displacements only become problematic if after passing gate i the gate + i 1 would not be visible when following the path from the expected positions of gate i to gate + i 1.
At the IROS drone race, gates are identical, so for our position to be estimated well, we need to assign a detection to the right gate. For this, we rely on our current estimated global positionˆ= [ˆˆ] x y x , k k k . When a gate is detected, we go through all the gates on the map using Equation (1) to calculate the predicted position¯= [¯¯] x y x , Then, we calculate the distance between the predicted drone's positionx k i and its estimated positionx k at time t k by After going through all the gates, the gate with the predicted position closest to the estimated drone position is considered as the detected gate. At time t k , the measurement position is determined by The gate assignment technique ( Figure 7) can help us obtain as much information on the drone's position as possible when a gate is detected. Namely, it can also use detections of other gates than the next gate, and allows to use multiple gate detections at the same time to improve the estimation. Still, this procedure will always output a global coordinate for any detection. Hence, false positive or inaccurate detections can occur and have to be dealt with by the state estimation filter.

| VML
The racing drone envisaged in this article has a forward-looking camera and an IMU. As explained in the previous section, the camera is used for localization in the environment, with the help of gate detections. Using a typical, cheap CMOS camera will result in relatively slow position updates from the gate detection, with occasional outliers. The IMU can provide high-frequency, and quite accurate attitude estimation by means of an AHRS. The accelerations can also be used in predicting the change in translational velocities of the drone. In traditional inertial approaches, the accelerations would be integrated. However, for smaller drones the accelerometer readings become increasingly noisy, due to less possible damping of the autopilot. Integrating accelerometers is "acceleration stable," meaning that a bias in the accelerometers that is not accounted for can lead to unbounded velocity estimates. Another option is to use the accelerometers to measure the drag on the frame, which-assuming no wind-can be easily mapped to the drone's translational velocity (cf. S. Li et al., 2018). Such a setup is "velocity stable," meaning that an accelerometer offset of drag model error would lead to a proportional velocity offset, which is bounded. On really small vehicles like the one we will use in the experiments, the accelerometers are even too noisy for reliably measuring the drag. Hence, the proposed approach uses a prediction model that only relies on the attitude estimated by the AHRS which is an indirect way of using the accelerometer. It uses the attitude and a constant altitude assumption to predict the forward acceleration, and subsequently velocity of the drone. The model is corrected from time to time by means of the visual localization. Although the IMU is used for estimating attitude, it is not used as an inertial measurement for updating translational The gates are displaced. The drone uses the gate's position on the map to navigate. After passing through the first gate, it will use the second gate's position on the map for navigation. After seeing the second gate, the position of the drone will be corrected [Color figure can be viewed at wileyonlinelibrary.com] velocities. This leads to the name of the method; VML, which will be explained in detail in this subsection.

| Prediction error model
As mentioned above, the attitude estimated from the AHRS is used in the prediction of the drone's velocity and position. However, due to the AHRS bias and the model inaccuracy, the prediction will diverge from the ground truth over time. Fortunately, we have visual gate detections to provide position information. This vision-based localization will not integrate the error over time but it has a low frequency. Figure k q k . At the beginning of this time window, the difference between the ground truth and the prediction is Δ − x k q and Δ − v k q . The prediction can be done with high frequency Attitude and Heading Reference System (AHRS) estimates. The vision algorithm outputs low-frequency unbiased measurements. The prediction curve deviates more and more from the ground truth curve over time because of the AHRS bias and model inaccuracy [Color figure can be viewed at wileyonlinelibrary.com] Assuming that there is no wind, and knowing the attitude, we can predict the acceleration in the x and y axis. Figure 9 shows the forces the drone experiences. * * T denotes the acceleration caused by the thrust of the drone. It provides the forward acceleration together with the pitch angle θ. * * D denotes the acceleration caused by the drag which is simplified as a linear function of body velocity (Faessler, Franchi, & Scaramuzza, 2017): where * c is the drag coefficient.
According to Newton's second law in xoz plane, Expand Equation (5), we have is the drag coefficient matrix. If the altitude is kept the same as in the IROS drone race, we have Since the model in the y axis has the same form as in the x axis, the dynamic model of the quadrotor can be simplified as where ( ) x t and ( ) y t are the position of the drone, and ϕ is the roll angle of the drone. In Equation (8), the movement in x and y axis is decoupled.
Thus we only analyze the movement in the x axis. The result can be directly generalized to the y axis. The nominal model of the drone in x axis can be written bẏ( where The superscript n denotes the nominal model. Similarly, with the assumption that the drag factor is accurate, the prediction model can be written aṡ( is assumed to be a constant in short time. Consider a time window where T s is the sampling time. The predicted states of model 10 are Thus, the error between the predicted model and nominal model can be written as is the input bias which can be considered as a constant in a short time. In Equation (13), Since the sampling time T s is small, (T s = 0.002 s in our case), we can assume Hence, Equation (13) can be approximated by F I G U R E 9 Free body diagram of the drone. * *( ) v t is the velocity of the drone. The superscript E denotes north-east-down (NED) earth frame while B denotes body frame. * * T is the acceleration caused by thrust and * * D is the acceleration caused by the drag, which is a linear function of the body velocity. g is the gravity factor and c is the drag factor which is positive. θ ( ) t is the pitch angle of the drone. It should be noted that since we use NED frame, θ < 0 when the drone pitches down [Color figure can be viewed at wileyonlinelibrary.com] Expanding Equation (17), we have Actually, = − − qT t t s k k q is the time span of the time window. If we neglect T s 2 term, we can have the prediction error at time t k Thus, within a time window, the state estimation problem can be transformed to a linear regression problem with model Equation (19), T are the parameters to be estimated. From Equation (19) In this simplified linear prediction error model, we use the constant altitude assumption to approximate the thrust T z B on the drone, which may lead to inaccuracy of the model. During the flight, this assumption may be violated by aggressive maneuvers in z axis. However, if the maneuver in z axis is not very aggressive and the time window is small (in our case less than 2 s), the prediction error model's inaccuracy level can be kept in an acceptable range. In the simulation and the real-world experiment shown later, we will show that although the altitude of the drone changes 1 m in 2 s, the proposed filter can still have very high accuracy with this assumption. Another way to improve the model accuracy is to estimate the thrust by fusing the accelerometer readings and rotor speed together, which needs the establishment of the rotors' model. It should also be noted that we neglect T s 2 term in Equation (18) to have a linear model. To increase the model accuracy, the prediction error model can be a quadratic model. In our case, since the time window is small, the linear model is accurate enough.

| Parameter estimation method
The classic way for solving the linear regression problem based on Equation (19) is to use the least square method (LS Method) with all data within the time window and estimate the parameters β . where The LS Method in Equation (20) can give optimal unbiased estimation. However, if there exist outliers in the time window k q k , they will be considered equally during the estimation process. These outliers can significantly affect the estimation result.
Thus, to exclude the outliers, we employ random sample consensus (RANSAC) to increase the performance (Fischler & Bolles, 1981 (Figure 10). When β i is estimated, it will be used to calculate the total prediction error ε i of the all the data in the time window In the process of Equation (21), if ϵ j is larger than a threshold σ th , it counts the threshold as the error. After all the iterations, the parameters β i which has the least prediction error will be selected to be the estimated parameters for this time window With the BRF method, the influence of the outliers is reduced, but it has no mechanism to handle over-fitting. For example, in time is the penalty factor/prior matrix. To minimize the loss function, we take derivatives of β (ˆ) J and let it be 0 Then we have the estimated parameters by We call the use of Equation (26)  To conclude, in this part we propose three methods for estimating the parameters β. The first one is the LS Method which considers all the data in a time window equally. The second method is BRF, which has the mechanism to exclude the outliers.
And the third one is PRF, which can not only exclude the outliers but also take into account the prior knowledge to avoid overfitting. In the next section, we will discuss and compare these three methods in simulation to see which one is the most suitable for our drone race scenario.

| Prediction compensation
After the error model (Equation 19) is estimated in time window k, the error model can be used to compensate the prediction by Also, at each prediction step, the length Δ = − − T t t k k q of the time window will be checked, since the simplified model 19 is based on the assumption that the time span of the time window ΔT is small. If ΔT is larger than the allowed maximum time window size ΔT max , the filter will delete the oldest elements until Δ < Δ T T max . The pseudo-code of the proposed VML with LS Method can be found in Algorithms 3 and 4.

| Comparison with Kalman filter
When it comes to state estimation or filtering technique, it is inevitable to mention the Kalman filter which is the most commonly used state estimation method. The basic idea of the EKF is that at time − t k 1 , it first predicts the states at time t k with its error covariance | − P k k 1 to have prior knowledge of the states at t k .
When an observation arrives, the Kalman filter uses an optimal gain K k which is a combination of the prior error covariance + | P k k 1 and the observation's covariance R k to compensate the prediction, which as a result, leads to the minimum error covariance P k .
According to Diderrich (1985), a Kalman filter is a least square estimation made into a recursive process by combining prior data with coming measurement data. The most obvious difference between the Kalman filter and the proposed VML is that VML is not a recursive method. It does not estimate the states at t k only based on the last step statesˆ− x k 1 . It estimates the states considering the previous prediction and observations in a time window.
In the VML approach, we use least square method within a time window, which looks similar to the least square estimation method.
F I G U R E 1 0 In the ith iteration, the data in the time window ∈ [ ] t t t , 1 9 will be randomly sampled into Δ − t k q k However, there are two major differences between the two methods.
The first one is that in the proposed VML, the prediction information is fused to the VML. Secondly and most importantly, we estimate the prediction error model β instead of estimating all the states in the time window as in the least square method. Thus, the VML has its advantages of handling outliers and delay by its time window mechanism and it also has the advantage of computational efficiency to the Least Square Estimation. In Section 4, we will introduce Kalman filter's different variants for outliers and delay and compare them with VML in estimation accuracy and computation load in detail.

| Flight plan and high-level control
With the state estimation method explained above, to fly a racing track, we employ a flight plan module which sets the waypoints that guide the drone through the track and a two-loop cascade P-controller to execute the reference trajectory ( Figure 11).
Usually, the waypoint is just behind the gate. When the distance between the drone and the waypoint is less than a threshold D turn , the gate can no longer be detected by our method, and we set the heading of the drone to the next waypoint. This way, the drone will start turning towards the next gate before arriving at the waypoint. When the distance between the drone and the waypoint is within another threshold D _ switch wp , the waypoint switches to the next point. With this strategy, the drone will not stop at one waypoint but already start accelerating to the next waypoint, which can help to save time. The work flow of flight plan module can be found in Algorithm 5.
We employ a two-loop cascade P-controller (Equation 30) to control the drone to reach the waypoints and follow the heading reference generated from the flight plan module. The altitude and attitude controllers are provided by the Paparazzi autopilot, and are both two-loop cascade controllers. ,  cos  sin  sin  cos  ,   0  0  ,  0  0  ,

| Simulation setup
To verify the performance of VML in the drone race scenario, we first test it in simulation and then use an EKF as benchmark to compare both filters to see which one is more suitable in different operation points. We first introduce the drone's dynamics model used in the simulation.
where ( ) x y z , , is the position of the drone in the Earth frame.
is the acceleration caused by other aerodynamics. The last four equations are the simplified first order model of the attitude F I G U R E 1 1 The Flight plan module generates the waypoints for the drone to fly the track. When the distance between the drone and the current waypoint < d D turn , the drone starts to turn to the next waypoint while still approaching the current waypoint. When < d D _ switch wp , the drone switches the current waypoint to the next one. The cascade P-controller is used for executing the reference trajectory from the flight plan module. The attitude and rate controllers are provided by the Paparazzi autopilot. k r is a positive constant to adjust the speed of the drone's yawing to the setpoint. In the real-world experiment and simulation, we set where ϕ b and θ b are the AHRS biases on ϕ and θ. B N and B E are the north and east bias caused by the accelerometer bias, which can be considered as constants in short time. From real-world experiments, they are less than 3°. Thus, the AHRS reading can be modelled by where f v is the detection frequency. Next, we randomly select n v points between u and v to be vision points. For these points, we generate detection measurement by In Equation (34), is the detection noise and σ * = 0.1 m In these n v vision points, we also randomly select a few points as outlier points, which have the same model with Equation (34) Figure 13.
When there are no outliers, all three filters can converge to the ground truth value. However, the EKF has a longer startup period and BRF overfits after turning, leading to unlikely high velocity offsets (the peaks in Figure 13b). This is because, after the turn, the RANSAC buffer is empty. When the first few detections come into the buffer, the RANSAC has a larger chance to estimate inaccurate parameters. In PRF, however, we add a prior matrix = ⎡ ⎣ ⎤ ⎦ P 0 0 0 0.3 to limit the value of Δv and the number of the peaks in the velocity estimation is significantly decreased. At the same time, the velocity estimation is closer to the ground truth value.
To evaluate the estimation accuracy of each filter, we first introduce a variable, average estimation error γ, to be an index of the filter's performance: where N is the number of the sample points on the whole trajectory.
x andŷ are the estimated states by the filter. x and y are the ground truth positions generated by the simulation. γ captures how much the estimated states deviate from the ground truth states. A smaller γ indicates a better filtering result.
We use running time to evaluate the computation efficiency of each filter. It should be noted that since we need to store all the simulation data for visualization and MATLAB has no mechanism of passing pointers, data accessing can take much computation time.
Thus, we only count the running time of the core parts of the filters, which are the prediction and the correction.
The results are shown in Figure 14. In the simulation, the time window in BRF and PRF is set to be 1 s and five iterations are performed in the RANSAC procedure. For each frequency, the filters are run 10 times separately and their average γ and running time are calculated. It can be seen in Figure 14a that when the detection frequency is larger than 30 Hz, BRF and PRF perform close to the EKF. In terms of calculation time, the EKF is heavier than BRF and PRF when the frequency is lower than 40 Hz. It is because that during the prediction phase, the EKF not only predicts the states but also calculates the Jacobian matrix and the prior error covariance | − P k k 1 by high frequency while BRF and PRF only do the state prediction. However, when the detection comes, the EKF does the correction by several matrix operations while BRF and PRF do the RANSAC which is much heavier. This explains why the EKF's computation load is only slightly affected by the detection frequency but BRF and PRF's computation load increases significantly with higher detection frequency.

| Comparison between EKF, BRF, and PRF with outliers
When outliers appear, the regular EKF can be affected significantly.
Thus, outlier rejection strategies are always used within an EKF to increase its robustness. A commonly used method is using Mahalanobis distance between the observation and its mean as an index to determine whether an observation is an outlier (Chang, 2014;Z. Li, Chang, Gao, Wang, & Hernandez, 2016). Thus, in this section, we implement an . When there are no outliers, EKF, BRF, and PRF's estimating result all converge to ground truth value. In velocity estimation, however, EKF has longer startup period than VML and BRF shows peaks, which is caused by the over-fitting. To limit this over-fitting, in PFR, we add a prior matrix Two examples of the filters' rejecting outliers are shown in Figure 15. The first figure shows a common case that the three filters can reject the outliers successfully. However, in some special cases, EKF-OR is vulnerable to the outliers. In Figure 15b, for instance, after a long time of pure prediction, the error covariance | − P k k 1 becomes large. Once EKF-OR meets an outlier, it has a high chance to jump to it. The subsequent true positive detections will be treated as outliers and EKF-OR starts diverging. At the same time, BRF and PRF are more robust to the outliers. The essential reason is that for EKF-OR, it depends on its current state estimation (mean and error covariance) to identify the outliers. When the current state estimation is not accurate enough, like the long-time prediction in our case, EKF-OR loses its ability to identify outliers. In other words, it tends to trust whatever it meets. The worse situation is that after jumping to the outlier, its error covariance become smaller which, as a F I G U R E 1 4 The simulation result of the filters. It can be seen that when the detection frequencies are below 20 Hz, the EKF performs better than Basic RANSAC Fitting (BRF) and Prior RANSAC fitting (PRF). However, when the detection frequencies are higher than 20 Hz, BRF and PRF start performing better than the EKF. In terms of computation time, the EKF is affected by the detection frequency slightly while the computation load of BRF and PRF increase significantly higher detection frequencies. F I G U R E 1 5 In most cases, EKF with outlier rejection (EKF-OR), Basic RANSAC Fitting (BRF), and Prior RANSAC fitting (PRF) can reject the outliers. But after a long time of pure prediction, EKF-OR is very vulnerable to the outliers while BRF and PRF still perform well. (a) When outliers appear, EKF-OR, BRF, and PRF can reject them. (b) After a long time of pure prediction, EKF-OR has large error covariance. Once it meets an outlier, it has a high chance to jump to it. As a consequence, the later true positive detections are beyond the threshold χ α and EKF-OR will treat them as outliers [Color figure can be viewed at wileyonlinelibrary.com] consequence, leads to the rejection of the coming true positive detections. However, for BRF and PRF, outliers are determined in a time window including history. Thus, after long time of prediction, when BRF and PRF meet an outlier, they will judge it considering the detections in the past. If there is no other detection in the time window, they will wait for enough detections to make a decision.
With this mechanism, BRF and PRF become more robust than EKF-OR especially when EKF-OR's estimation is not accurate. Figure 16 shows the estimation error and the calculation time of the three filters. As we stated before, although EKF-OR has the mechanism of dealing with the outliers, it still can diverge due to the outliers in some special cases. Thus, in Figure 16a EKF-OR has large estimation error when the detection frequency is both low and high.
In terms of calculation time, it can be seen that it has no significant difference with the non-outlier case.

| Filtering result with delayed detection
Image processing and visual algorithms can be very computationally expensive for running onboard a drone, which can lead to significant delay ( van Horssen, van Hooijdonk, Antunes, & Heemels, 2019;Weiss et al., 2012). Many visual navigation approaches ignore this delay and directly fuse the visual measurements with the onboard sensors, which sacrifices the accuracy of the state estimation. A commonly used approach for compensating this vision delay is a modified Kalman filter proposed by Weiss et al. (2012). The main idea of this approach, called EKF delay handler (EKF-DH), is having a buffer to store all sensor measurements within a certain time. At time t k , a vision measurement corresponding to the states at earlier time t s arrives. It will be used to correct the states at time t s . Then, the states will be propagated again from t s to t k (Figure 17a). Although updating the covariance matrix is not needed according to Weiss et al. (2012), this approach still requires updating history states whenever a measurement arrives, which can be computationally expensive especially when the delay and the measurement frequency get larger. In our case, we need to use the error covariance for outlier rejections, it is necessary to update the history error covariance matrices, which in turn increases the computation load further. At the same time, for VML, when the measurement arrives, it will first be pushed into the buffer. Then, the error model will be estimated within the buffer/time window. With the estimated parameter β, the prediction at t k can be corrected directly without the need of correcting all the states between t s and t k (Figure 17b). Thus, the computational burden will not increase when the delay exists. Figure 18 shows an example of the simulation result of the three filters when both outliers and delay exist. In this simulation, the visual delay is set to be 0.1 s. It can be seen that although there is a lag between the vision measurements and the ground truth, all the filters can estimate accurate states. However, EKF-DH requires much more computation effort. Figure 19 shows the estimation error and the computation time of the three filters.
In Figure 19, we can see that the computation load of EKF-DH increases significantly due to its mechanism of handling delay. Unsurprisingly, EKF-DH is still sensitive to some outliers while BRF and PRF can handle the outliers. process, which can be caused by system interrupts. Thus, we first exclude the outliers by the Interquartile Range Method (Upton & Cook, 1996) and then provide the statistics for each component. The result can be found in Figure 22 and Table 3.
From Table 3, it can be seen that vision takes much more time than the other three parts. Please note though that the snake gate computer vision detection algorithm is already a very efficient gate detection algorithm. In fact, it has tunable parameters, that is, the number of samples (a) (b) F I G U R E 1 7 The sketches of EKF delay handler (EKF-DH) and visual model-predictive localization's (VML) handling delay mechanism. (a) The sketch of the EKF-DH proposed in Weiss et al. (2012). When the measurement arrives at t k , EKF-DH first corrects the corresponding states at t s and then updates the states until t k . (b) The sketch of VML's mechanism of handling delay. When the measurement arrives, it will be pushed to the buffer with the corresponding states. Then, the error model will be estimated by the RANSAC approach. At last, the estimated model will be used to compensate the prediction at t k . There is no need to update all the states between t s and t the approach presented in this article is that we do not employ VIO and SLAM, which would take substantially more processing. However, as the Snake gate detection provides relatively low-frequency and noisy position measurements, the VML needs to run in high frequency and cope with the detection noise to still provide accurate estimation for the controller.  Table 4. In Table 4, x g and y g are the position of the gates in the real world andx g andỹ g are their position on the map. In this situation, they are the same. The aim of this experiment is to test the filter's performance with sufficient detections. Thus, the velocity is set to be ∕ m s 1.5

| Flying experiment without gate displacement
to give the drone more time to detect the gate. In Figure 23, the blue curve is the ground truth data from OptiTrack motion capture system and the yellow curves are the filtering results. From the flying result, it can be seen that the filtered results are smooth and coincide with the ground truth position well. During the period when the detections are not available, the state prediction is still accurate enough to navigate the drone to the next gate. When the drone detects the next gate, the filter will correct the prediction. In this situation, the divergence of the states is only caused by the prediction drift. It should also be noted that when the outliers appears at s 84 , the filter is not affected by them because of the RANSAC technique in the filter.

| Flying experiment with gate displacement
In this section, we test our strategy under a difficult condition where the drone flies faster, the gates are displaced and the detection frequency is low. The real gate positions and their position on the map are listed in Table 5  The pose estimation is based on the gates' position on the map. When the gates are displaced, the drone still thinks they are at the position which the map indicates. After the turn, when the drone sees the next gate, which is displaced, it will attribute the misalignment to the prediction error and correct the prediction by means of new detections.
With this strategy, our algorithm is robust to the displacement of the gates.

| Flying experiment with different altitude and moving gate
We also show a more challenging trace track where the height of the gates varies from 0.5 to 2.5 m. Also, during the flight, the position of the second gate (2.5 m) is changed after the drone passes through it.
In the next lap, the drone can adapt to the changing position of the gate (Figure 26).
The flight result is shown in Figure 27. In this flight, the waypoints are not changed and the gates are deployed without any   is still demonstrated that this light-weight flying platform has the ability to finish the drone race task autonomously. Compared with a regular size racing drone, the Trashcan has more complex aerodynamics and is more sensitive to disturbances. On the other hand, it has faster dynamics which can make maneuvers more agile. More important, it is much safer than a regular size racing drone, which may even allow for flying at home. In any case, the present approach represents another direction of the autonomous drone race, which does not need high performance and heavy onboard computers. Also, without computationally expensive navigation methods such as SLAM and VIO, the proposed approach is still able to make the drone navigate autonomously with relatively high speed.
However, the proposed approach still has its limitations. First of all, in this approach, we don't estimate the thrust. Instead, we use a non-changing altitude assumption to approximate the thrust to derive the prediction error model. The simulation and real-world experiments have shown that violating this assumption still results in accurate estimation. However, when the racing track will contain more considerable height changes, it may become desirable to estimate the thrust with a model, to have a more accurate error model and increase the estimation accuracy, especially in more aggressive flight. This is a major bottleneck of increasing the speed of the flight. In the future, we will design a gate detection method using deep learning methods to detect the gate in a more complex environment. This deep net can then run on the GPU of the Jevois. Also, higher speeds could be attainable.
Thirdly, in this paper, we mainly focus on the navigation part of the drone. The guidance is only a waypoint based method and the controller is a PID controller. To make the drone fly faster, optimal guidance and control methods are needed (S. Li, Ozturk, De Wagter, de Croon, & Izzo, 2019;Tailor & Izzo, 2019;Tang, Sun, & Hauser, 2018). Another direction is to explore joint estimation for navigation.
This will become very useful when one assumes that gates are mostly not displaced. Then, over multiple laps, the drone can get a better idea of where the gates are.
In the future, with the high speed development of computational capacity, when more reliable gate detection and online optimal control are implemented onboard, the speed of this autonomous racing drone can certainly be increased significantly. Compared with regularly sized drones, this tiny flying platform will be able to perform faster and more agile flight. At that time, the proposed VML approach will still be suitable for providing stable state estimation for the drone.

| CONCLUSION
In this paper, we presented an efficient VML approach to autonomous drone racing. The approach employs a velocity-stable model that predicts lateral accelerations based on attitude estimates from the AHRS. Vision is used for detecting gates in the image, and-by means of their supposed location in the map-for localizing the drone in the coarse global map. Simulation and real-world flight experiments show that VML can provide robust estimates with sparse visual measurements and large outliers. This robust and computationally very efficient approach was tested on an extremely lightweight flying platform, that is, a Trashcan racing drone with a Jevois camera. In the flight experiments, the Trashcan flew a track of three laps with an average speed of 2 m/s and a maximum speed of 2.6 m/s.
To the best of our knowledge, it is the world's smallest autonomous racing drone with a weight six times lighter than the currently lightest autonomous racing drone setup, while its velocity is on a par with the currently fastest autonomously flying racing drones seen at the latest IROS autonomous drone race.