A Self-learning Framework for Statistical Ground Classification using Radar and Monocular Vision



Reliable terrain analysis is a key requirement for a mobile robot to operate safely in challenging environments, such as in natural outdoor settings. In these contexts, conventional navigation systems that assume a priori knowledge of the terrain geometric properties, appearance properties, or both, would most likely fail, due to the high variability of the terrain characteristics and environmental conditions. In this paper, a self-learning framework for ground detection and classification is introduced, where the terrain model is automatically initialized at the beginning of the vehicle's operation and progressively updated online. The proposed approach is of general applicability for a robot's perception purposes, and it can be implemented using a single sensor or combining different sensor modalities. In the context of this paper, two ground classification modules are presented: one based on radar data, and one based on monocular vision and supervised by the radar classifier. Both of them rely on online learning strategies to build a statistical feature-based model of the ground, and both implement a Mahalanobis distance classification approach for ground segmentation in their respective fields of view. In detail, the radar classifier analyzes radar observations to obtain an estimate of the ground surface location based on a set of radar features. The output of the radar classifier serves as well to provide training labels to the visual classification module. Once trained, the vision-based classifier is able to discriminate between ground and nonground regions in the entire field of view of the camera. It can also detect multiple terrain components within the broad ground class. Experimental results, obtained with an unmanned ground vehicle operating in a rural environment, are presented to validate the system. It is shown that the proposed approach is effective in detecting drivable surface, reaching an average classification accuracy of about 80% on the entire video frame with the additional advantage of not requiring human intervention for training or a priori assumption on the ground appearance.