Spacecraft autonomy is crucial to increase the science return of optical remote sensing observations at distant primitive bodies. To date, most small bodies exploration has involved short timescale flybys that execute prescripted data collection sequences. Light time delay means that the spacecraft must operate completely autonomously without direct control from the ground, but in most cases the physical properties and morphologies of prospective targets are unknown before the flyby. Surface features of interest are highly localized, and successful observations must account for geometry and illumination constraints. Under these circumstances onboard computer vision can improve science yield by responding immediately to collected imagery. It can reacquire bad data or identify features of opportunity for additional targeted measurements. We present a comprehensive framework for onboard computer vision for flyby missions at small bodies. We introduce novel algorithms for target tracking, target segmentation, surface feature detection, and anomaly detection. The performance and generalization power are evaluated in detail using expert annotations on data sets from previous encounters with primitive bodies.
Small bodies—the asteroids, comets, and other primitive objects in the solar system—are highly valuable targets for scientific exploration. These objects have undergone little modification since their formation, so they uniquely reveal the processes that shaped our solar system in both the early nebula and later as a consequence of large-scale dynamical events. They tell us about our own solar system's history and inform our interpretation of exoplanet systems and their potential for life. The ones we have visited exhibit striking diversity, suggesting that we have only just begun to characterize their populations. Some of these primitive bodies (Near-Earth objects, Phobos, and Deimos) have also been identified as possible targets for the extension of humanity in space and are the focus of reconnaissance missions. All this has given them a high priority in future mission plans by NASA, ESA, and other space agencies.
These targets have difficulties commensurate with their great value. Many lie in remote and challenging orbits that can only be reached by expensive high delta-V maneuvers. Consequently, most encounters of small bodies to date have been flybys that provide just a few minutes or hours to collect data during closest approach. A few extended encounters, such as the Dawn encounters at Vesta and Ceres, Near-Earth Asteroid Rendezvous (NEAR) at Eros, Hayabusa at Itokawa, and the Rosetta mission at 67P/Churyumov-Gerasimenko, allow multiple command cycles and opportunities for repeat imaging. But for the most part, primitive bodies data collection has occurred autonomously with no direct supervision from the ground. Light time delays range from tens of minutes for main belt objects to multiple hours for trans–Neptunian objects. The quality and quantity of information obtained from these prescripted flyby sequences so far has been limited. In addition, space weathering and regolith processes often conceal features of scientific significance. This makes it difficult to capture the diversity of these bodies in the course of flybys. Nevertheless, flybys provide a low-cost means to characterize many bodies in a single mission and therefore develop a critical population-level understanding of these objects. They are prominent in proposed missions such as a Trojan Tour and Rendezvous [Squyres et al., 2011] and Main Belt Asteroid Flyby missions [Britt et al., 2010].
This paper discusses ways that autonomy and onboard intelligence can benefit flyby science. Specifically, we demonstrate autonomous onboard data analysis and response to help close the gap in science value of flybys with regard to extended encounters. Operators can command the spacecraft to adapt targeting decisions in real time, migrating rudimentary decisions across the light time gap for more responsive data collection. Many of these capabilities can be formulated as computer vision tasks. We will therefore begin by surveying previous encounters and the computer vision approaches in exploration spacecraft. We will discuss science objectives of future missions and the roles for autonomous data collection. We will focus in depth on a handful of key enabling technologies: data quality assurance, target tracking, plume detection, and surface feature detection. A series of experiments on archival flyby sequences demonstrates techniques that can reliably achieve these objectives.
1.1 Primitive Bodies Exploration: Science and Challenges
The exploration of primitive bodies addresses a variety of key science themes: origin science (e.g., Solar nebula processes and dynamical evolution), understanding of planetary processes (e.g., outgassing), and astrobiology (primitive bodies as reservoirs of organics, etc.). Composition measurements are instrumental to many of these objectives. For example, identifying and characterizing the nature of migrated bodies can test models of early solar system history and dynamical evolution, which can in turn quantify the degree of mixing of original planetesimal reservoirs. Of the many instruments that could assist this goal, few are suited for flyby exploration. Elemental composition (e.g., X-ray or gamma ray spectroscopy) require integration times of hours or days that are not compatible with the short observation span offered by flybys. Alternatively, ultraviolet and infrared spectroscopy can reveal signatures of a bodies' origin through observations of origin markers including organics and volatiles.
Optical remote sensing observations are particularly difficult to obtain at primitive bodies because of space weathering in altering surface composition. Weathering develops over a thickness greater than the depth probed by optical remote sensing instruments (i.e., millimeters versus microns). However, fresh material may be exposed as a consequence of impact or mass wasting processes (e.g., Figures 1a–1c) or seismic shaking (Figure 1d). As a result, opportunistic remote observations of material excavated at craters or as a consequence of surface overturn present the best opportunities to understand these targets' composition during a flyby.
Outgassing is expected at cometary bodies, activated asteroids, and water-rich asteroids [Sonnett et al., 2011] and can provide insight into the deep interior composition of these objects. However, outgassing is a transient and somewhat unpredictable feature. In asteroids (Trojan, main belt) it is faint with low contrast (24 Themis [Jewitt and Guilbert-Lepoutre, 2012]). In the case of comets, there is the added difficulty of changing properties as the body approaches the Sun so that the landscape is evolving as a function of time.
An increasing number of astronomical observations have also revealed that systems with multiple asteroids are frequent [Marchis et al., 2005]. Dactyl's companion Ida (∼1.4 km diameter) was discovered in images returned by Galileo (Figure 1e), much after the opportunity for follow-up observations had passed. While detailed observations of binary asteroid orbital properties provide an avenue for density measurements, in this particular case the unexpected Dactyl was not subject to extensive imaging.
Acquisition of image —or any—data is complicated in the early stage of an encounter by the need to accommodate errors in relative position knowledge. In the case of NEAR's target Mathilde, the uncertainty ellipse in the target-relative position was large [Veverka et al., 1997] (Figure 2), even though Mathilde's orbit was well characterized at the time of the encounter. When targeting a population like the Trojan asteroids or giant planet irregular satellites, observations of specific targets prior to encounters are scarce. Furthermore, the low albedos (<0.1) and small sizes (<100 km) of these bodies make it difficult to track them for extensive periods of time prior to the encounter.
In summary, primitive bodies flybys face numerous challenges, in particular uncertainty over the target and the uncertainty over the trajectory, both of which preclude targeted measurements that are important for bypassing the regolith to reveal the target's composition and history. For missions specifically identified in the planetary science decadal survey [Squyres et al., 2011] it is not clear to what extent the observations returned during single flyby opportunities can meet the science objectives.
1.2 Computer Vision Applications and Prior Work
This chapter presents onboard image analysis as a solution to the challenges of flyby science. We will describe each of these capabilities in turn, detailing previous work on the topic.
1.2.1 Target Tracking and Surface Segmentation
Perhaps the most basic function is to simply identify the object centroid in order to keep the spacecraft instruments pointed at the target. This is a challenging problem due to position uncertainty. Missions including Giotto and Deep Impact have incurred data loss due to problems with target tracking. A common approach is to acquire extra images in a raster pattern surrounding the predicted target location. This approach was used in early encounters including the Galileo encounters with asteroids Gaspra and Ida and the later NEAR encounter of Mathilde [Bhaskaran et al., 2004]. Similar strategies are planned for the Rosetta encounter at 67P/Churyumov-Gerasimenko. This is effective, but it precludes fine pointing. It requires excess spacecraft resources in terms of time, power, and onboard storage to collect the raster.
Missions can reduce redundant data collection by using onboard image analysis to identify the target centroid for trajectory and instrument pointing updates. One example is the Autonav system in use during the Deep Impact encounters and the EPOXI (a combination of two missions, the Extrasolar Planet Observation and Characterization (EPOCh) and the Deep Impact Extended Investigation (DIXI)) encounter of Hartley 2. Autonav relies on two alternative strategies depending on the distance to the target. It uses a Center of Brightness algorithm during closest approach [Mastrodemos et al., 2005], computing a weighted centroid of pixel brightness values within an envelope of three standard deviations around the predicted asteroid location. This provides robust response to multilobed objects with more than one brightness peak. For more distant images, it uses the Blobber approach of Russ  that takes the centroid of the largest connected region with intensity above a predetermined threshold. A similar strategy was used in the flyby encounters of Deep Space 1 at Borrelly and the Stardust flyby of Annefrank [Bhaskaran et al., 1998], successfully tracking both objects. Neither algorithm attempts to delineate the target body, which would be important for more fine-grained instrument targeting at the target center, limb, or plume. The additional ability to segment these distinct image areas would provide flyby science capabilities beyond wide-angle imaging, providing a new palette of feature-relative targeting commands.
1.2.2 Data Quality Assurance
Another way that autonomous image analysis could assist flybys is through data quality assurance. Predicting good camera gain and exposure settings is difficult due to poorly constrained surface properties and brightness, particularly for faint, small objects with limited terrestrial observations. Image artifacts and poor exposure settings occasionally impact mission science yield. For example, in the Deep Impact encounter of asteroid Braille, the target brightness was almost 40 times less than predicted, perhaps due to a combination of camera sensitivity, albedo, and unanticipated morphology. This led to the failure of onboard tracking which could not resolve the object [Bhaskaran et al., 2000]. Other imaging artifacts include cosmic rays and compression artifacts. The ability to recognize bad data for reacquisition, or to adjust instrument settings dynamically for optimal contrast, could significantly reduce this risk of one-time flybys. It would also reduce requirements for collecting redundant data and achieve baseline mission goals more quickly.
1.2.3 Detection of Surface Targets of Opportunity
Finally, onboard analysis could be used to identify specific surface features of interest for high-resolution imaging. These include disrupted surface features such as crater rims and lineaments. This is highly valuable for asteroids, since every example obtained to date is covered by a thick regolith that conceals deep interior material. In most cases, absorption features, diagnostic of the asteroidal composition, have been erased as a consequence of space weathering. A more promising way to learn about objects' interior compositions and histories is to focus on the rare localized, faint features that reveal deep material excavated through cratering or morphological features that could provide information on the overall strength (i.e., porosity) of the deep interior. Unfortunately, untargeted open-loop flybys provide few if any opportunities to capture this level of detail. By recognizing morphological and albedo anomalies that indicate fresh material, autonomous data analysis could direct targeted imaging and spectroscopic measurements that could significantly improve overall science return. While many studies have treated the topic of automatic geological feature detection [Burl et al., 2001; Argialas and Mavrantza, 2004; Thompson et al., 2005; Bue and Stepinski, 2006; Farrand et al., 2008; Estlin et al., 2012; Bekker et al., 2014b], we are not aware of any previous study to detect specific surface features on primitive bodies.
One final challenge facing all onboard data processing is that of highly constrained computing resources. Most spacecraft computers are far slower than contemporary consumer-grade processors due to the expense and time required to certify components for the radiation of space. At the time of this writing, a typical processor is the RAD750, which performs approximately 266 Million Instructions Per Second. FPGA-based (field-programmable gate array) processing is occasionally used for parallelizable operations, and multicore processors may eventually be available. However, resources for most image processing are likely to remain constrained, so that algorithms will require one or more orders of magnitude longer to run in space than on typical desktop or laptop platforms.
2 Data Sets
In order to evaluate different automated processing methods we assembled several flyby image sequences representing primitive bodies targets. This section describes the data sets, the preprocessing that was applied, and the ground truth classes used in the study.
All the data in our evaluations are from framing cameras. Traditionally, these spacecraft cameras are monochrome devices that use a mechanical wheel to switch between different color bandpass and polarization filters. Most flybys include some combination of navigation (or navcam) imagery and science imagery. The former is characterized by regular acquisitions by a monochrome camera with a wide field of view. They provide visual context, spatial awareness, and optical target-relative navigation. In contrast, science cameras typically acquire more targeted measurements for specific investigations. They commonly use narrow fields of view with specific filter combinations designed to emphasize different mineralogical or chemical features. We will focus on the navcam images in this work. These are particularly useful for selective targeting because they have wide fields of view, are activated at regular intervals, and are design ed with exposures that resolve surface features of the target. They are also more pervasive and comparable across different flyby missions. Navcam images typically have resolutions of approximately megapixel size (e.g., 1024 × 1024 pixels).
Despite the occasional presence of color filter data, we will focus entirely on analysis of single-band images; multiband images may not be available onboard in a form that would permit automatic analysis. The reason is the time delay between exposures in different color channels due to the need for mechanical filter wheel motion. During this short interval the target-relative motion and the target's own rotation can cause shifts in the scene content, manifesting as spectral artifacts. These artifacts persist after coregistration by rigid or affine transformations, and there is currently no simple automated approach to remove them. Fortunately, all of the science features we are pursuing are also evident in monochrome images, and these single-band scenes are sufficient for our study.
We selected three sequences for detailed investigation (Figure 3). The first is a flyby of Tempel 1 by the Stardust NeXT mission. Tempel 1 is a comet that had been visited previously by the Deep Impact spacecraft. The return mission sought to characterize the changes that had occurred on the surface in the intervening years, in order to characterize the coma and nucleus/coma interaction and to image key surficial features such as bright cryoflows and scarps. A sequence of over 70 navcam images was acquired at regular intervals during the encounter [Klaasen et al., 2013]. They reveal a wide range of bright features of interest including high-albedo spots, craters, and differences in surface texture and albedo. These features range from a few meter scale up to 10 s of meters. The best spatial resolution was 2 m per pixel but more generally between 2–12 m per pixel for the close proximity sequence. Otherwise, resolution upon approach ranged up to 100 s m per pixel.
The second sequence is a flyby of Phoebe by the Cassini spacecraft. The Cassini flyby of Phoebe occurred in 2004 and resulted in over 300 images from the Imaging Science System [Porco et al., 2004]. While technically a moon of Saturn, Phoebe's history is uncertain and it may be a captured planetesimal from the Kuiper Belt. The Cassini image shows a cratered surface covered in thick regolith, with several bright crater wall erosion features enriched in ice posing unique science targets of opportunity.
Finally, we considered a comet flyby with a target object with prominent plume jets. The EPOXI mission flew by comet Hartley 2 in 2011; its cameras returned images of a concave, bilobate object with remarkably heterogeneous surface features and activity levels. Prominent targets of opportunity include high-albedo spots, plumes, and the plume/surface interface. Over 200 images were returned from the flyby sequence, of which we selected the middle 45 for which the target had a reasonable apparent size in the navcam images.
We exhaustively labeled each pixel in every image according to one of four categories: outer space; plume, for those pixels that were noticeably brighter than the background; surface, for the illuminated visible portion of the primitive body; high-albedo spots, which are localized bright patches indicating fresh material or differential composition; and unclassified. We used the unclassified category for ambiguous parts of the image including areas that could contain invisible or nonilluminated parts of the target and the ambiguous border around plumes.
For fidelity to onboard data, we used raw images supplied by NASA's Planetary Data System archive. The images were all subjected to initial preprocessing to a level which could feasibly be performed onboard. This includes basic dark subtraction and flat-fielding operations. We also performed a simple cosmic ray removal to remove streaks and bright connected components smaller than a size threshold. A range of cosmic ray removal approaches exist, but we found that later processing was relatively insensitive to the precise method used. Deterministic, consistent image artifacts such as edge effects were removed manually prior to processing. While no absolute radiometric calibration was performed, a correction for exposure time was used to normalize all the image values in the sequence to a constant factor of the true radiance.
This section describes the approaches we have developed to address each of these small bodies computer vision tasks. We formalize the problems of surface tracking and surface feature detection as machine learning problems, amenable to efficient solutions by a combination of image processing and supervised classification. We address the challenge of imaging artifacts with a separate probabilistic target appearance model.
3.1 Surface Segmentation and Tracking
We compared several different methods for target tracking. The first was the traditional brightness centroid method, which calculates a weighted mean of all pixels comfortably above the background signal level. This has been a common choice for optical navigation due to its reliability and computational simplicity.
We also evaluated a new approach, first mentioned in Thompson et al. [2012a] and expanded further in Thompson et al. [2012b] and Wagstaff et al. . This method has the advantage of additional robustness to plumes and strong intensity gradients. It uses edge detection to identify an envelope enclosing the body and then finds the true geometric centroid. The first step in this approach is an edge detection stage using an edge detector like the Canny algorithm [Canny, 1986]. This identifies points on the horizon and surface of the target. We find the result is insensitive to edge detection parameters; the horizon is generally a stronger contour than any other location, yielding good results from a wide range of thresholds. The most probable failure mode is cosmic rays and linear artifacts, which must be completely excised before edge detection.
After edge detection, a second step finds the target surface using the 2-D convex hull of edge points (e.g., the enclosing convex polygon). This effectively filters spurious edge points caused by linear features such as cast shadow or microtextures on the surface. The polygon effectively segments the image into a surface and nonsurface region. The centroid of this surface is based on exterior geometry which is relatively stable under changes in the direction of illumination. Consequently, we find that the result is usually closer to the actual geometric center of the target than for the brightness pixel approach. It is also reliable in the presence of cometary plumes or outgassing; plumes are diffuse, so the edge detector ignores them.
Figure 4 shows the process for a Cassini image of Phoebe. Figure 4a shows the original image, and Figure 4b shows the result of the edge detection operation. The convex hull closes off minor concavities due to surface texture and cratering (Figure 4c). This is helpful for bilobate objects and other irregular shapes which could be partly self-shaded. Some narrow field of view follow-up instruments require specific illumination conditions such as a desirable solar phase angle. For these instruments, additional rules can be applied to find the most suitable candidate target point.
3.2 Surface Feature Detection
We formulate the surface feature detection task as a classification problem as summarized in Figure 5. Our approach is based on simple image filtering operations and statistical classification. This exploits the strength of statistical object recognition while permitting very fast execution times. Figure 6 depicts a schematic of the proposed system in operation including training, model uplink, and autonomous operation during a flyby.
The aim of preprocessing is to mitigate illumination variations and emphasize high-albedo surface features. To this end, we apply a cross median filter on the current frame and subtract the result from the original gray scale image. After renormalizing, the difference image highlights high-albedo regions independent from local image exposure. As a result, surface features in dark areas and well-lit areas give similar responses. In addition, median filters have the desired properties of linear run time [Perreault and Hebert, 2007] and edge preservation hence suppressing the bright edges at the borders of small bodies. Figure 5b shows the renormalized difference image for a frame from Hartley 2 depicted in Figure 5a. High-albedo features are clearly discernible in the illuminated part of the comet as well as in shadow.
3.2.2 Candidate Detection
The next step is the actual detection and localization of surface feature candidates. We use intensity weighted mean shift clustering [Comaniciu and Meer, 2002] on the difference image for mode detection. Specifically, we use a circular box kernel weighted by the gray scale intensity of the pixels in the difference images. Due to the discrete nature of raster images, this step can result in multiple adjacent detections. These clusters are reduced to single locations by running mean shift again but this time on the binary image of detections from the first round. The resulting detections represent the locations of candidate surface features. While this detection procedure covers nearly all true positive samples, it yields a large number of false positive detections, which have to be filtered out by a classification algorithms. Figure 5c shows a large number of surface feature candidates depicted as red crosses on a frame from Hartley 2. The ground truth annotations from a domain expert are shown as green circles.
3.2.3 Training Procedure
In order to train a machine learning algorithm to differentiate between true detections and false positives we need a set of positive and negative examples. To this end, a planetary scientist annotated all frames from close encounters to the comets Hartley 2 and Tempel 1 by labeling surface features of interest. Figure 7 presents a sequence of 12 frames from Hartley 2 including the expert's annotations shown as red circles. Detected candidates within a predefined distance of 10 pixel to ground truth annotations are labeled as positive while the other detections are labeled as negative. This set of positive and negative training samples can then be used to learn and validate a classifier. Figure 5d depicts image patches of positive examples from comet Hartley 2. We also force the classifier to be rotation invariant by augmenting the training set with rotated and flipped copies of patches from the positive class [Fuchs et al., 2008].
The training set as described in the previous section can be used in a machine learning framework to construct a classifier which is able to discern true surface features from false detections. First, we extract image patches of size 11 × 11 pixels at the locations of surface features candidate detections from the original gray scale frames as shown in Figure 5c. To achieve robust results, each patch is normalized by shifting the median to 0.5 and scaling the intensity values to the range of [0,1] (cf. Figure 5e).
The image patches are described by a set of numerical attributes which are subsequently used for classification. These attributes comprise the raw intensity values, general image statistics like mean, median, and standard deviation and the local gray value and gradient histograms. To this end, the patches are partitioned spatially as shown in Figure 8 and then the gray scale intensities are histogrammed per bin and for the whole patch. Finally, an attribute vector is constructed containing the raw pixel intensities, the image statistics, and the local and global histograms.
Based on the extracted attributes, we train a random forest classifier [Amit and Geman, 1997; Breiman, 2001] to differentiate actual surface features from false positive candidates. In recent years random forests or decision forests have been extended for clustering [Breiman, 2001], online learning [Saffari et al., 2009], interactive learning [Fuchs and Buhmann, 2009], and density estimation [Criminisi et al., 2011] to mention just a few. The applications range from medical imaging [Fuchs et al., 2008; Fuchs and Buhmann, 2011] over gaming [Shotton et al., 2011] to space exploration [Thompson et al., 2012c]. Random forests have a number of properties which make them a suitable choice for autonomous computer vision during flyby missions:
They can infer nonlinear interactions between attributes and hence are able to construct the complex model necessary for high accuracy in computer vision.
Random forests implicitly perform attribute selection and thus can deal with a large number of attributes while being robust against noisy or noninformative variables.
The ensemble structure favors parallel training of the decision trees in a distributed manner which allows handling of large amounts of training data in a reasonable time frame. When the proposed system is trained with data from several missions, the amount of imagery demands the use of parallel distributed learning.
Random forests can not only be learned but also tested in parallel which results in fast execution speed. Besides implementations for GPUs (Graphics Processing Units) [Sharp, 2008], FPGA implementations are already available for space exploration [Bekker et al., 2014a] and hence make random forests an ideal choice for onboard computer vision.
Specifically, we train a random forest with 100 trees by bootstrapping the training data for every tree and optimizing over 10 randomly chosen attributes an their threshold at every split node. The trees are grown until completion without pruning, and the final prediction is achieved by taking the majority vote of overall trees in the ensemble. The ratio of trees voting for a true surface features versus the ones voting for a false positive can be interpreted as the confidence of the classifier in its overall prediction. We use this confidence estimate to generate the precision/recall plots shown in section 4.2 for estimating the generalization power of the classifier.
For comparison we also train a support vector machine (SVM) with Gaussian radial basis function. The cost parameter c = 1.5 and the width of the kernel γ = 0.001 are optimized in a grid search with tenfold cross validation on the training data. For both models, all experiments are conducted twice. Once with the original training set and once with the augmented training set containing additional flipped and rotated patches of the surface feature class.
3.3 Target Appearance Model
Occasionally, errors due to cosmic rays, imaging artifacts, or poor exposure will cause problems in target tracking, surface segmentation, or detection operations. The onboard system can excise these cases with the help of a secondary target appearance model. This independent system acts as a check on target detection, so that the spacecraft can abstain from autonomous action if the data do not match the envelope of prior expectations. We consider formulating this problem both as anomaly detection (find points that do not match prior expectations) or a binary classification problem (categorize segmentations as either legitimate or problematic).
We treat the visible attributes of properly segmented objects as independent, identically distributed realizations of a common appearance distribution and estimate its parameters in advance on representative archival data with a range of possible target appearances. We use features drawn from the shape of the segmented surface, specifically the kurtosisfk of the intensity of target surface pixels, the roundnessfrnd of the contour, the contrast of the target, fcnt, represented as a signal to noise (SNR) ratio above the background, the eccentricity of the best fitting ellipse, fecc, defined by the ratio of major and minor axes, and the background kurtosisfbk which indicates the noise properties outside the segmented surface.
For the anomaly detection method we use a simple isotropic Gaussian density function. The log likelihood L(S) of a surface region S written
Here the sum is taken over all of the features described above. Model parameters μ and ϕ are the mean and standard deviation of these two attributes computed from training data. For the binary classification model we generate an accept/reject decision by applying the same set of features with a secondary random forest classifier.
4.1 Surface Segmentation and Tracking
This section compares performance for target tracking and feature detection on the archival images. We first evaluate the basic ability to find and track the target object. Figures 9-10 show tracking results from Tempel 1, Phoebe, and Hartley 2 flybys, respectively. We plot the pixel error between the estimated and actual object centroids based on the manual labels. Note that since these manual labels do not extend to the invisible nonilluminated portions of the object, there is some unavoidable ambiguity about the actual centroid. However, in the absence of formal 3-D geometric modeling the manual labels provide a useful proxy estimate. We compare results using the status quo “center of brightness” method as well as convex hull segmentations and also compare segmentations with and without the appearance model. We also consider the random forest result, estimating a centroid using pixels labeled as surface. The filtered convex hull and random forest outperform the status quo methods in all three cases, particularly during close approaches where the illuminated horizon of the object biases the center of brightness result. This bias grows as the object expands in size and reaches over 100 pixels for some images.
The convex hull approach is insensitive to minor changes in illumination, though it fails in the presence of catastrophic image quality problems such as the compression artifacts evident in the first half of the Phoebe sequence as depicted in Figure 11. These compression artifacts have prominent intensity gradients and appear as edges in the initial edge detection step. The appearance model effectively filters these cases, and the combination of the two provides better performance than any other method. Figure 12 shows the performance across all flybys, with the boxes indicating data quartiles and median and the whisker indicating the extrema. This plot demonstrates the utility of appearance model filtering for improving the robustness of the tracking solution.
4.2 Surface Feature Detection
The proposed framework is evaluated by conducting cross-validation experiments on public archival data of small bodies. Sequences of navcam images were acquired at regular intervals during the encounters [Klaasen et al., 2013]. A domain expert labeled all surface features of interest as described in section 3.2.3. In total this amounts to 47 frames from the encounter of Deep Impact with 103P/Hartley and 72 frames from 9P/Tempel (cf. Figure 13).
The textbook evaluation procedure for such scenarios in machine learning would be to perform cross validation on sample level, i.e., training on a subset of surface features from all images and testing on the holdout set. In our setting this approach leads to nearly perfect classification accuracy and hence vastly overstates the predictive power of such a model. This is mainly due to the fact that the samples (surface feature candidates) are not independent and identically distributed but highly correlated. Specifically, surface features on one small body look very much alike but can be dramatically different from other small bodies. Similarly, performing cross validation per frame or using the out-off-bag error estimate from the random forest classifier [Breiman, 2001] leads to excellent performance, overestimating the power of the model to generalize to a new target body.
To overcome this problem, we resorted to a very strict validation scheme by doing leave-one-out cross validation on a per body level. In particular, we train on all samples from one comet and test the performance on the samples of a complete different, unseen body. This procedure also resembles more closely the scenario we are going to encounter on board in actual flyby missions. We can train a model on all previously seen and labeled asteroids and comets, but we will not have any knowledge about a small body that we have never encountered before. For missions with multiple encounters to the same object one could envision updating the classification model with data from previous flybys. This approach would significantly boost classification accuracy, but is not the focus of this work.
Overall detection performance of whole flyby sequences is reported in terms of precision and recall as depicted in Figures 14 and 15. The model is trained on all samples from either Hartley 2 or Tempel 1 and then the training error is calculated on the same data, while the test error is estimated on the samples from the unseen small body. We vary the threshold on the confidence estimate of the classifier as described in section 3.2.4 to generate precision/recall plots as shown in Figures 14 and 15. Precision is the fraction of detected samples that are true surface features labeled by a planetary scientist, while recall is the fraction of true surface features that are detected. Specifically, precision is defined as TP/(TP+FP) and recall as TP/(TP+FN), TP referring to true positive detections, FP to false positive and FN to false negative detections.
The near-perfect performance on the training set (not shown) demonstrate that the numerical attributes are expressive enough to accurately model surface features and to train a classifier to differentiate true features from false positive. This is contrasted by a rather poor performance on the test set of samples from previously unseen objects shown in Figures 14 and 15.
In practice we are interested in the high-precision regime of the performance curve. During a flyby mission the proposed framework can be used to point a specialized instrument with a narrow field of view at a surface feature of interest. In flyby scenarios the autonomous system will only have time to point once at an interesting feature, making it of lower priority to have a classifier with high recall which would cover all viable features. In that respect the generalization from Tempel 1 to Hartley 2 as shown in Figure 15 would be completely feasible since the spacecraft would target just the features in which the classifier has the highest confidence. The application of the Hartley 2 model in a flyby at Tempel 1 on the other hand would likely fail since we have approximately 50% false positive surface features in the high-precision regime as illustrated in Figure 14. In this light, the classifier strategy seems most appropriate for encounters involving targets that are well characterized (i.e., for which representative examples exist) or for which at least one prior flyby has been performed.
Random forest models consistently outperform support vector machines in both experiments. Random forests better generalize to surface features on new bodies while SVMs seem to be prone to overfitting due to high number of support vectors as a result of the cross-validation grid search procedure.
Augmenting the training set improves the generalization performance significantly for both algorithms when training on Hartley 2 but has no effect when training on Tempel 1. This might be due to the fact that the few subtle surface features on Tempel 1 appear almost circular and are not a good representation for surface features on other bodies, in general. The major reason the model generalizes better from Hartley 2 to Tempel 1 than the other way around is most likely the fact that Hartley 2 has significantly more surface features to learn from than Tempel 1. Hence, the Hartley 2 model encompasses a wider range of appearances of surface features and can also better discriminate them from the large set of false positive detections. The much lower number of samples from Tempel 1 is clearly not enough to learn a representative model of surface features for use at other bodies. These results illustrate the necessity to train models based on data representing the variety of comets and asteroid available to guarantee the best generalization performance possible for future flyby missions.
4.3 Target Appearance Models
Finally, we evaluate target appearance models using multiple training regimes. The first evaluation is a simplified case where images from a similar instrument and flyby are available in advance. This scenario uses a “leave one image out” training strategy, cycling through the entire data set, holding out each image in turn, training on the remainder, and then evaluating the system's decision on the held out image. We also consider a more challenging case where an entire flyby sequence is held out entirely from training. This would be the case for a completely new object and a poorly characterized instrument, where the system must extrapolate from a library of previous flybys. To deserve a reject decision, an image segmentation must differ from the intended result so that the union of the two areas has less than 50% overlap with the intersection. This generally occurred during pathological errors such as compression artifacts, image column striping, or the occasional image which did not contain the target object.
We next consider the appearance model's performance for excising erroneous segmentations. The Tempel 1 and Hartley 2 image data sets have very few artifacts, but both single-class and multiclass solutions correctly identify the few imaging artifacts as the top most important images to be excised. The artifacts include two column striping errors in the case of Tempel 1 and a blank image during the Hartley 2 flyby (Figure 16a). Five more segmentations fell slightly under the 50% overlap threshold but received lower scores, suggesting they would not have been ignored. However, on manual inspection, each of these segmentation results appeared defensible—they were caused by subtle double exposures (Tempel 1) or misestimation of the surface during low-contrast plume images (Hartley 2, illustrated in Figure 16c).
Phoebe provides a larger and more comprehensive data set. This data set contains many erroneous frames due to compression artifacts and other suboptimal imaging conditions. Figure 17 shows a receiver operating characteristic (ROC) curve of the independent decision to reject or keep the segmentation for each frame.
4.4 Runtime and Memory Considerations
On commercial off-the-shelf Linux hardware one frame is analyzed in 200 ms on average. In addition, we ported the framework to VXworks and tested it in VXsim assuming a RAD750 system. Based on past experience in translating machine learning algorithms to flight hardware, the increased processing time will be less than 20 s depending on the processor load assumption for a specific spacecraft and mission. The amount of memory necessary for running the surface feature detection algorithm, which is the most expensive step in the pipeline, is two times the frame size plus the random forest model size, which fits easily in memory for all appropriate missions. One of the technically most similar project in the past was the automatic detection of dust devils and clouds on Mars [Castano et al., 2008]. The framework was tested on JPL's Surface System Test Bed, which is a rover that is functionally identical to the Mars Exploration Rovers Spirit and Opportunity, with a 20 MHz RAD6000 CPU. The results show that excluding image acquisition, the runtime of the cloud detector is under 20 s per image and that of the dust devil detector is approximately 15 s per image in all-in-one mode. In feed mode, after the overhead needed to set the running average, the analysis of each new image takes 10 s. Another example is the Autonomous Exploration for Gathering Increased Science target detection system [Estlin et al., 2012] which was optimized for RAD600 and effectively deployed on Opportunity. In particular, the algorithm was modified to run more efficiently and to require less memory. The original version of the target detection algorithm consumed 64 MB of memory but was ultimately reduced to 4 MB to fit within our allotted onboard memory budget. These memory considerations are very similar to the constraints one would encounter by implementing our system on a spacecraft like Deep Impact on its EPOXI mission.
This work describes a novel and comprehensive framework for onboard computer vision for flyby missions at small bodies. It comprises algorithms for target tracking and segmentation as well as anomaly detection. To the best of our knowledge, this is also the first demonstration that autonomous surface feature detection is feasible. To quantitatively evaluate the performance of the proposed methods, we developed a data set containing imagery from NASA's Planetray Data System which were independently and manually annotated by planetary scientists and domain experts. This data set is released together with this publication to provide the community with a reference data set for future research and algorithm comparison.
The accuracy of the framework was tested in cross-validation experiments on data from comets 103P/Hartley and 9P/Tempel which emphasized the need for more and diverse training data to learn robust models with good generalization performance. Future flyby missions to comets and asteroids could benefit from such a framework by allowing not only robust tracking but also precise pointing of narrow field of view instruments like spectrometers and hence significantly increase science return. These capabilities are also useful for extended or multiple flyby encounters, in which they can precisely target specific features of interest despite positional uncertainty of both spacecraft and target. This can greatly expand the palette of spacecraft commanding options available to the operations team. Future missions visiting more distant targets will introduce increasing light time gaps as well as high encounter speeds resulting in only a few frames for decision making. In this light, autonomous onboard computer vision will become increasingly important for obtaining observations critical for understanding of the composition and history of the outer solar system.
All raw image data are available publicly through NASA's planetary data system at pds.jpl.nasa.gov. In addition, the expert labels as well a the specific flyby sequences used for training and testing are provided through flyby.jpl.nasa.gov. This work was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. Government sponsorship acknowledged. Copyright 2015, California Institute of Technology.