Smart, texture-sensitive instrument classification for in situ rock and layer analysis


  • K. L. Wagstaff,

    Corresponding author
    1. Machine Learning and Instrument Autonomy, Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, USA
    • Corresponding author: K. L. Wagstaff, Machine Learning and Instrument Autonomy, Jet Propulsion Laboratory, California Institute of Technology, 4800 Oak Grove Drive, Pasadena, CA 91109, USA. (

    Search for more papers by this author
  • D. R. Thompson,

    1. Machine Learning and Instrument Autonomy, Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, USA
    Search for more papers by this author
  • W. Abbey,

    1. Planetary Chemistry and Astrobiology, Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, USA
    Search for more papers by this author
  • A. Allwood,

    1. Planetary Chemistry and Astrobiology, Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, USA
    Search for more papers by this author
  • D. L. Bekker,

    1. Instrument Flight Software and Ground Support Equipment, Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, USA
    Search for more papers by this author
  • N. A. Cabrol,

    1. Space Science Division, NASA Ames Research Center/SETI Institute, Moffett Field, California, USA
    Search for more papers by this author
  • T. Fuchs,

    1. Mobility and Robotic Systems, Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, USA
    Search for more papers by this author
  • K. Ortega

    1. Distributed and Real-Time Systems, Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, USA
    Search for more papers by this author


[1] Science missions have limited lifetimes, necessitating an efficient investigation of the field site. The efficiency of onboard cameras, critical for planning, is limited by the need to downlink images to Earth for every decision. Recent advances have enabled rovers to take follow-up actions without waiting hours or days for new instructions. We propose using built-in processing by the instrument itself for adaptive data collection, faster reconnaissance, and increased mission science yield. We have developed a machine learning pixel classifier that is sensitive to texture differences in surface materials, enabling more sophisticated onboard classification than was previously possible. This classifier can be implemented in a Field Programmable Gate Array (FPGA) for maximal efficiency and minimal impact on the rest of the system's functions. In this paper, we report on initial results from applying the texture-sensitive classifier to three example analysis tasks using data from the Mars Exploration Rovers.

1 Introduction and Motivation

[2] Landers and rovers equipped with cameras have been pivotal in developing a detailed understanding of surface morphology and geologic processes on solar system bodies including Venus [Garvin et al., 1984], Mars [Adams et al., 1986; Squyres et al., 2006], and Titan [Soderblom et al., 2007]. However, currently available flight cameras are passive data collectors requiring ground-based teams to analyze and make decisions on almost every maneuver—a process which consumes precious mission lifetime and therefore limits ultimate science return of missions. This “ground in the loop” process is constrained by the speed of light and operational constraints such as the availability of power for radio transmission, the location of relay orbiters, the availability of Deep Space Network receivers, and time available from mission personnel and scientists for reviewing the data and formulating the next set of commands.

[3] There is great interest in reliable onboard image processing that can identify scientifically relevant surface features [Gilmore et al., 2000; Gulick et al., 2001; Castano et al., 2007; Smith et al., 2007]. Cameras equipped with smart image analysis capabilities would enable missions to accomplish more science in the available time by reducing the need to send data to ground-based teams for some decisions. The Mars Exploration Rover Opportunity has taken steps in this direction by conducting some onboard, in situ analysis of images as they are collected. The Autonomous Exploration for Gathering Increased Science (AEGIS) system permits the rover to detect scientific features of interest (e.g., rocks of particular size, shape, or angularity) and automatically target such features for follow-up observations at higher resolution [Estlin et al., 2012]. However, because AEGIS uses the rover's main CPU, other activities must be suspended until the onboard image processing is complete. Necessarily, only a subset of the images collected can be analyzed onboard. More importantly, AEGIS can only extract simple albedo and shape features and therefore is unable to detect patterns such as rock layers or textured terrain.

[4] We have formulated a new instrument concept that employs state-of-the-art machine learning methods to accomplish scientific objectives onboard. TextureCam integrates the imager and analyzer, going beyond simply recording pixels to classifying and interpreting the surface and rock textures present. Further, the algorithm can be implemented using a highly efficient Field Programmable Gate Array (FPGA) independent of the rover's main CPU. The results can inform onboard decisions such as which targets in a panoramic scene to analyze in more detail, how to analyze close-up targets effectively, or simply how to prioritize data and images for transmission back to Earth. By eliminating the command-loop delay, remote spacecraft can quickly respond to targets of opportunity or dynamic events in seconds rather than hours or days.

[5] In this paper, we report on three kinds of scientific investigations that are enabled by TextureCam and that are commonly cited as important components of onboard image analysis [Gulick et al., 2001; Castano et al., 2007]. In order of increasing complexity, they are as follows: (1) find all rock targets within a scene (to characterize rock size distribution or inform further sampling decisions), (2) identify good sampling targets (more subtle than just finding rocks, this scenario seeks rocks with flat surfaces), and (3) find layered rocks (in support of astrobiology and sedimentary geology investigations). The results demonstrate the value of smart instruments that can conduct their own onboard analysis, with potential benefits for future rover, lander, and orbiter missions.

2 Methods: Random Forest Classifier

[6] We aim to detect and characterize different kinds of texture that appear in images collected by the camera. Unlike the geologic concept of rock texture, which refers to an intrinsic property of the rock, here “texture” refers to discriminative, statistical patterns of pixels in an entire image (which may include some rocks with detectable geologic textures). We posit that these numerical features can capture enough information to characterize geologically relevant aspects of a site such as surface roughness, pavement coatings, unconsolidated regolith, sedimentary fabrics, and differential outcrop weathering.

[7] We use machine learning methods to analyze training data with known properties (and textures) and construct a model that can later be used to classify textures in new images. Classifiers such as neural networks or support vector machines (SVMs) have been used to tackle geophysical analysis problems such as identifying boundaries between geologic facies [Tartakovsky and Wohlberg, 2004]. We instead employ a random forest classifier, a state-of-the-art technique that often outperforms neural networks and SVMs in terms of speed, accuracy, and robustness to noise in the data [Breiman, 2001]. A random forest consists of several decision trees, each of which is trained on a different subsample of the data. The trees vote collectively on each classification decision, which yields a more reliable result than any individual decision. The random forest is also well suited for an efficient FPGA implementation since it is highly parallelizable: each pixel can be classified independently of all others in the image.

[8] Figure 1 shows the TextureCam classification architecture. We represent each pixel in the image with an attribute vector inline image, where d is the number of attributes (e.g., intensity, high-pass filtered value, range, height). We train the random forest classifier using pixel vectors from training images in which each pixel was manually labeled with a class of interest (e.g., “rock,” “sand,” “sky”). Finally, the classifier outputs a class probability map. The example in Figure 1 shows the output probability for “rock,” ranging from blue (low) to red (high).

Figure 1.

TextureCam system architecture. To classify a newly acquired image, four input channels are computed: raw pixel values, high-pass filtered value, range (distance), and height. Each pixel is independently classified by a model trained on previously labeled images. The output is a probability map for each class; an example for the “rock” class is shown here.

[9] Each tree in the random forest is trained on a different subset of the labeled pixels. The tree begins as a root node and progressively grows branches to distinguish between different subpopulations in the data. Each node in the tree is either “pure” (contains pixels that are all of the same class) or “mixed.” For mixed nodes, the algorithm searches for a test that can optimally split the pixels into distinct groups. Each test r takes one of these forms: p1>τ, |p1p2|>τ, p1p2>τ, p1+p2>τ, or inline image, where p1 and p2 are the attribute values for two randomly selected pixels within a local window around the pixel to be classified. The threshold value τ is selected to maximize the information gain (class discrimination) that can be achieved by splitting the pixels at that node using τ [Shotton et al., 2008]. The algorithm evaluates hundreds of candidate tests and uses the best scoring test/threshold pair as the final splitting criterion for that node. This generates two new child nodes that contain the pixels that passed or failed the test, respectively. The process continues recursively for each child, terminating at pure nodes.

[10] New pixels are assigned to one of the known classes as follows. For each tree in the random forest, beginning with the root node of the tree, the appropriate test is applied to the new pixel p. The outcome of that test tells the algorithm which child of the node to visit next. This process continues until the algorithm reaches a pure node, where it outputs the probability that p belongs to each class, using the distribution of labeled training pixels that reached the same node during training. The final classification output from the forest is the product of class probabilities from each tree.

3 Experimental Results

[11] We conducted several experiments using random forests to train texture-based classifiers and then classify previously unobserved images into categories of interest. These demonstrations were conducted on images collected by the Mars Exploration Rovers.

3.1 Finding Rocks

[12] Upon reaching a new location, an important first step is the identification and characterization of exposed rock surfaces. These rocks provide candidates for subsequent targeted instrument deployment and contact sensing. Further, direct analysis of identified rocks enables the characterization of the local environment by its distribution of rock sizes, colors, and compositions. A large amount of research has already been invested in methods for automatically identifying rocks in such images [Gulick et al., 2001; Castano et al., 2007; Thompson et al., 2011; Gong and Liu, 2012].

[13] We trained a random forest using 23 images from the “Mission Success” panorama collected by the Spirit rover during its first week of operations. These were manually labeled by analysts to include all rocks of at least 10 pixels in size, yielding thousands of rocks at ranges from 2–10 m [Golombek et al., 2005]. We assigned a “terrain” label to pixels within the labeling region that were not assigned to the “rock” class. We then tested the trained classifier on a separate set of 23 images taken from the “Legacy” panorama from sol 59. Both panoramas were acquired over a span of several days and exhibit a range of different illumination conditions.

[14] The random forest was trained with information about pixel intensity (raw and high-pass filtered), range (distance), and height (see Figure 1). The height data was the result of convolving the pixel's altitude with a broad median filter to recover the ground plane and then subtracting this from the original altitude. The forest consisted of 32 trees trained on 1,000,000 pixels sampled from the labeled training data. The analysis window size was 127 × 127 pixels. The number of trees is specified by the user and controls the diversity of the learned forest. Figure 2 shows classification results for one of the Legacy images. Rocks are clearly identified in red.

Figure 2.

(a) Example subimage from the Legacy panorama and (b) the rock probability map output from TextureCam. Red (blue) regions indicate high (low) probability.

[15] The rock detection output can guide subsequent sampling or data acquisition with a contact instrument. Performance can be evaluated in terms of precision (reliability) and/or recall (completeness). Since follow-up targeting can only be applied to a subset of potential targets, we focused on precision. We calculated the probability, over all images, that the classifier was correct in its identification of the pixel most likely to be part of a rock (see Figure 3). Using intensity information alone, the classifier was 91% correct, while incorporating stereo information increased performance to 97%. A blind selection of targets within the image (“random”) selected rock targets only 27% of the time.

Figure 3.

Rock classification performance (precision) for the images in the Legacy panorama. Boxes show the range from the first to third quartiles of the results, and the horizontal band shows the median.

3.2 Estimating Surface Condition

[16] We seek to enable rovers to autonomously find new targets and collect data directly from them during long traverses. However, arm-mounted instruments such as Raman spectrometers are sensitive to the condition of the surface, and factors such as dust deposition or fracturing can thwart automatic data collection. These hazards may not be evident in the coarse stereo data used to place instruments. Image texture analysis provides an additional means to assess surface condition for sampling.

[17] Figure 4 shows a typical result. For this task, stereo information is less relevant, so we trained a classifier using only the raw pixel intensities. We used examples from a previous image of a Meridiani outcrop, labeling as “good” several candidate sample sites that are fracture- and dust-free. These generally correspond to bright, contiguous surfaces. We also labeled image regions corresponding to broken rock, dust or other sediment as “poor.” We trained the system on 100,000 pixels sampled from a single panorama and several rotated versions to improve rotation invariance. The trained random forest consisted of 16 trees, with a window size of 127 × 127 pixels. Figure 4 shows that the system identified several contiguous clean surfaces as high-probability targets.

Figure 4.

Random forest classifier results for detecting sampling surfaces that are dust-free and in good condition. (a) The original image (Planetary Photojournal PIA014132) showing a young crater on Meridiani Planum. (b) Heat map (red = higher probability) showing identified sampling candidates.

[18] We compared the output of the classifier to an independent classification of the same scene done by a human expert. Of the 14 targets identified by the classifier as good sampling surfaces (areas in red in Figure 4), 13 (93%) matched with a manually chosen area.

3.3 Finding Layered Structures

[19] Layered rock structures are of particular interest for astrobiology, as they are a characteristic feature of many water-deposited sedimentary rocks. Water-lain sediments are high priority targets that may indicate ancient habitable surface environments, and their identification is an important step in narrowing the search for ancient signs of microbial life. Igneous layered rocks are also important targets that can provide information about planetary interior processes and extrusive lava activity.

[20] We trained a third classifier to identify layered structures using images from the “Gibson” panorama collected by the Opportunity rover (sols 748–751) when it was stationed near Home Plate. We labeled a single image with four terrain types: layered rock, smooth rock, vesicular rock, and soil. We trained a random forest using 10 trees, 50,000 samples, and a window size of 15 × 15 pixels. A smaller window size was used due to the smaller scale of the features of interest. For this task, we employed a suite of simple bar filters that are sensitive to linear image features at eight orientations between 0° and 180°. After convolving each input image with each filter, we stored the maximum filter response across all orientations, along with the raw pixel value, yielding a vector of two values for each pixel.

[21] The top row of Figure 5 shows the resulting layer probability map for (left) the training image and (right) a test image. Each map shows the probability of membership in the class of interest (layered rock). Training the classifier with four classes allowed it to learn finer distinctions and improved performance on all classes. The bottom row of Figure 5 shows the final result, in which we used stereo information to filter detections (layer probability ≥50%) to only those realistically reachable in the next few command cycles (≤ 5 m away). Since the base images were collected with the left camera, stereo data was not available for the far left and no detections were reported for that region. Visually, the classifier did a good job of highlighting layered areas and omitting partially buried layers, which have a weaker texture response. This result could be used to inform further sampling by guiding a high-resolution rover instrument to examine layered regions more closely.

Figure 5.

Random forest classifier results for layered surface detection on (left) the labeled training image and (right) one of the test images. The top row shows the probability maps (red = high probability of layers), and the bottom row shows the final layer detections (red), filtered using stereo data to those detections within 5 m of the camera.

[22] We also quantitatively evaluated the classifier's ability to detect layered regions that were (1) large enough to be useful for subsequent sampling and (2) close enough to be readily accessible. We further filtered the detections within 5 m to include only those that had an area of at least 1000 pixels. We manually reviewed each region identified by the classifier to determine whether it contained layers or not. Layer detection on the training image was accurate for 32 of 32 regions (100%), and detection across all test images was accurate for 58 of 58 regions (100%). Effectively, the classifier was conservative enough that it did not generate any false positives while still identifying a large number of good targets for further sampling.

[23] Perfect results are always somewhat suspect, so we investigated further to assess the limits of the classifier. We investigated the classifier's sensitivity to the choice of analysis window size, which emerged as a critical factor. The preceding results were obtained with a window size of 15 × 15 pixels. Figure 6 shows performance on the training and test images as the window size was varied. As expected, in general, training performance was equal to or higher than performance on previously unseen (test) images. We found that decreasing the window size led to more false classifications of small, flat areas as layered areas, while increasing the window size resulted in fewer total detections and lower reliability in those detections. We conclude that a window size of 15 × 15 pixels is most appropriate for the scale of layers present in these images. This parameter allows the user to specify the scale of layers of interest; some may be interested in fine layers (small window) while others prioritize coarse layers (large window). If multiple scales are of interest, multiple classifiers can be trained using different window sizes.

Figure 6.

Layered region classification accuracy on the Gibson panorama images as a function of the analysis window size. Training accuracy is generally higher than test performance on new images. In both cases, we find a strong dependence on the size of the window used to analyze the images; the best performance is achieved when the window size matches the scale of the layers present in the scene.

4 Conclusions and Benefits for Future Missions

[24] The technology to support onboard image analysis continues to mature. Its implementation was pioneered by the AEGIS system on the Mars Exploration Rover Opportunity [Estlin et al., 2012]. We propose a further innovation by incorporating texture-sensitive analysis into the onboard setting and by embedding analysis capabilities into the instrument that collects the data. TextureCam is a smart camera that employs an internal FPGA to quickly classify image contents using texture-based features. Our experimental results show that TextureCam's random forest classifier is effective at addressing a variety of scientifically relevant questions one might pose of imagery collected in a new environment: Are there rocks present? Where are good sampling surfaces? Are layers present?

[25] The results of texture-based onboard analysis can inform in situ decisions about the next imaging, measurement, or sampling targets. It can also guide content-based image compression by allocating more bandwidth to image areas of greater scientific value [Dolinar et al., 2003; Wagstaff et al., 2004]. Further, the remote spacecraft can report content-based summaries for all images even if they are not all returned at full resolution (e.g., navigational images). These summaries could include information about the number and size distribution of rocks, the variety and extent of layered structures and their orientations, and so on. The three tasks we evaluated in this paper provide only a first glimpse of the kind of analyses possible with a smart camera. Mission planners can train different random forests to accomplish different objectives and upload or update them as needed.

[26] Future missions to Mars, the surface of Europa, or other bodies with surface terrain of interest stand to benefit from the inclusion of a smart camera such as TextureCam. The classifiers employed by the camera can be configured to specific mission objectives. We expect that the system could easily be adapted for orbital cameras to capture other texture-sensitive phenomena of interest, such as the “tiger-stripe” lineations on Enceladus [Porco et al., 2006], faults on Europa [Tufts et al., 1999], or new impact craters on Mars [Byrne et al., 2009]. TextureCam and its descendants have the potential to greatly increase the autonomy and scientific return of future rover and orbiter missions.


[27] The TextureCam project is supported by the NASA Astrobiology Science and Technology Instrument Development program (NNH10ZDA001N-ASTID). The Mars Exploration Rover images used in this study were obtained from the Planetary Data System (PDS). We thank Matt Golombek, Rebecca Castaño, Ben Bornstein, and many students who contributed the manual rock labels used in our rock-finding study. This work was carried out at the Jet Propulsion Laboratory, California Institute of Technology under a contract with the National Aeronautics and Space Administration. Government sponsorship acknowledged.

[28] The editor thanks Philip Christensen and an anonymous reviewer for their assistance in evaluating this manuscript.