A layered learning system
Our machine-classification experiments were completed using a nearest-neighbor instance-based supervised layered learning system based on kernel density estimation (named ARLO, Automated Recognition and Layered Optimization). ARLO uses bias optimization to find the most effective combinations of experimental parameters (Tcheng et al., 1989, 1991). At the highest level, bias optimization uses an optimizer to maximize the performance of a learning system by manipulating control parameters of the learning system – the bias space (Fig. 1). Bias space is a formal, parameterized space representing all decisions that determine model performance. Bias space includes, but is not limited to, sample preparation, choice of imaging technology, image resolution, choice of image features, image weighting metrics, choice of training examples, single or set of learning algorithms, and how they are combined. A given point in bias space defines all the variables that control the learning process, specifying both the example representation and the learning algorithm to use.
Figure 1. Schematic of the layered machine-learning classification system. There are two important components to the layered classification system: an optimizer and the learning system. The optimizer seeks to find the bias values (B) that result in the highest performance value (&Pbar;). The best value discovered within an experimental run is reported as the optimized result. The learning system tries bias values (&Bbar;) from the optimizer and outputs a resulting performance values (&Pbar;). Within the learning system, there are two areas where bias is optimized: feature representation (&Bbar;rep) and learning algorithm (&Bbar;alg). Feature representation refers to how the image features are extracted from the pollen images, as described in the Materials and Methods section, and how they are weighted by learning system. Pollen examples are then divided into training and testing sets, which are used to evaluate both the learning algorithm and the feature representation with a cross-validation performance estimator. The performance estimator provides the resulting performance value, &Pbar;, to the optimizer.
Download figure to PowerPoint
The optimizer searches bias space by trying a series of candidate points, keeping those that maximize the performance metrics. In this study, we had two distinct performance metrics of interest: grain-to-grain classification accuracy (for both our modern and fossil pollen experiments) and whole slide pollen ratio accuracy (for our fossil pollen experiments). We used a stochastic hill-climbing optimizer to find points in bias space resulting in the best performance. Automating the search in bias space makes the system robust and adaptable to different image recognition problems. Bias search automation also removes reliance on a human experimenter to operate the classification system.
We tested our system against two separate data sources: modern reference material derived from vouchered herbarium specimens, where the taxonomic identity of the material was known (Supporting Information, Table S1), and fossil material from Nelson Lake, Illinois, where taxonomic classifications were based on our expert identifications (Table S2). For the modern pollen analysis, in addition to samples of black and white spruce, P. mariana (Mill.) Britton, Sterns & Poggenb. and P. glauca (Moench) Voss, we included two outgroup genera (Abies and Pinus) that share a similar morphology, as well as a third North American spruce species, P. rubens. For the fossil pollen experiments, we focused our efforts on discriminating the two species that were most abundant in our Nelson Lake samples: black and white spruce. Both sets of pollen material had been prepared following standard protocols (Faegri et al., 1989), with silicone or balsam oil as the mounting medium.
Modern reference pollen samples were from reference collections at the University of Illinois and the Illinois State Museum, isolated from University of Minnesota herbarium specimens (Table S1). Up to 100 grains were imaged from each slide, which represented a single individual tree. An uneven number of representatives were available for each modern class, with an especially limited amount of material available for P. rubens. Entire grains, with minimal damage, were chosen for imaging in this analysis of modern material, following the procedure described in the Pollen imaging section.
Fossil spruce examples were from duplicate slides of sediment residues from a published study on Nelson Lake, Illinois (Curry et al., 2007) (Table S2). Ten samples were analyzed, from depths corresponding to high black spruce concentrations, high white spruce concentrations, and roughly equal proportions of the two species. All samples were from fine clay deposits. The preservational quality of these samples were comparable, although more grain damage was observed in the deeper material.
Grains were chosen for imaging for the fossil analysis using a semirandomized method. Student researchers scanned the slides following parallel transects and electronically marked the XYZ location of all saccate grains. Approximately 100 of all the marked grains were then randomly chosen and imaged following the procedure described in the Pollen imaging section. As a result, damaged grains were included, if they were recognizable as a saccate grain by a non-expert. This includes grains that were mechanically or physically changed through tearing, corrosion, or folding.
Each imaged fossil grain was also manually classified by a pollen expert as one of three classes: black spruce, white spruce, or other saccate grain (Pinus/Abies) (Table S2); 896 of the 1014 imaged grains were identified as spruce. Identifications were made using the original sample slides, with images as reference to verify that the same grain was being observed. We used a number of morphological features to determine classification manually: grain size, width of saccus at point of attachment, saccus height, angle of saccus attachment, degree of constriction of saccus at point of attachment, saccus shape, endoreticulate pattern of sacci, and relative size of sacci to corpus (Fig. 2; Birks & Peglar, 1980; Hansen & Engstrom, 1985; Lindbladh et al., 2002).
Figure 2. Morphological features of the Picea pollen grain. Illustration of the morphological differences between the pollen of Picea glauca (white spruce, left) and Picea mariana (black spruce, right). As saccate grains, the two main components of the spruce pollen grain are the saccus (a), a bladder which hydrates and inflates upon contact with the gymnosperm pollen drop; and the pollen body, or corpus (b); c represents the angle of attachment of the saccus to the body. Common features used in the discrimination of black and white spruce include size (P. glauca generally having larger grains); angle of the sacci attachment to pollen body (P. glauca being more acute); texture of the pollen body (P. glauca being finer); the internal reticulate structure of the saccus (P. glauca having larger, more circular lumina); and the saccus shape (P. glauca having blunter sacci). Note that these traditional measurements of sacci and corpus size and attachment angle can only be taken in the equatorial view (Hansen & Engstrom, 1985; Lindbladh et al., 2002), as illustrated.
Download figure to PowerPoint
We included a qualitative assessment of expert confidence with each classification in order to record the difficulty of each classification and the certainty of the expert identification. For black and white spruce, these included: 50% (recognized as spruce, but species uncertain), 60% (few key features representative of the species), 70% (several key features representative of the species), 80% (most features representative of the species), 90% (almost all features representative of the species), ≥ 95% (all features representative of the species). These numbers capture the self-reported confidence of the human expert for a given classification and so are, by necessity, approximate.
We used structured illumination (a Zeiss Apotome fluorescence microscope; Weigel et al., 2009) to produce high-resolution, three-dimensional images. Because of the relatively thin pollen wall of these saccate grains, structured illumination allowed us to capture grain shape and volume in addition to detailed surface images. Images were acquired following a standard manual protocol to minimize variation that would lead to imaging artifacts and potential misdirection of the machine results. Images were taken by multiple researchers, with no one researcher responsible for a single species. Pollen grains were photographed as image stacks using autofluorescence (563 nm excitation frequency (green), 581 nm emission frequency (red)), at 400× magnification (40× EC Plan Neofluor objective, NA 0.75). The shape and depth of the grain were captured as multiple z-focal planes at intervals of half the Nyquist frequency (0.69 μm for this objective; Fig. 3). A typical grain was represented by c. 50 focal slices. Each individual image pixel measured 0.0256 μm2. Grains were cropped manually, using a bounding box that reached from the maximum width of the grain in the x-axis and the maximum length of the grain in the y-axis. The z-stack was limited to the uppermost and lowermost in-focus planes of the grain.
Figure 3. The pollen image stack. Subsample of images from a single z-stack of a Picea glauca pollen grain (sample B1500, position 5). A total of 63 images were taken; the slices shown represent approximately every fourth image. Note the ability of structured illumination fluorescence to capture the three-dimensional shape of the grain, including the far pollen wall. The final image (lower right-hand corner) is a maximum intensity projection of the top half of the grain. Each image pixel measures 0.16 μm × 0.16 μm. The original 63 images were taken at 0.69 μm increments in the z-plane. Bar, 25 μm.
Download figure to PowerPoint
Example representation and classification
Each image within the z-stack was reduced to a vector of image features. For each new pollen classification problem, the bias optimizer determined the appropriate weights and resolutions for each feature (Fig. 1). These measurements do not directly relate to morphological characters that palynological experts would use, but do describe a large range of morphological variation, totaling > 16 000 dimensions of morphological space. The optimized image features can be categorized into three broad categories:
Intensity distribution. A representation of the probability distribution function of pixel intensity values with a variable number of equally populated bins (quantile values). We used probability distributions with an optimized resolution, from two to 40 quantile bins.
Gross shape. To make gross image comparisons, we compared low-resolution projections of our high-resolution images. These projections captured overall shape and the degree of image coarseness was optimized. We used resolutions as low as 1 × 1 pixels to resolutions as ‘high’ as 11 × 11 pixels.
Texture. We approximated texture as the change in sign of the first derivative of pixel intensities along a series of horizontal, vertical, or diagonal lines. The line length was optimized, and ranged from 1 to 13 pixels. The texture features were applied to the original high-resolution image as well as down-sampled versions of the image at varying scales.
In addition to these optimized feature groups, we also used three fixed measurements in our classification: image area (the area of an image slice); image aspect (the height-to-width aspect ratio of an image slice); and image depth (the depth of a grain measured as the number of image slices). Grains were classified using a weighted contribution of each image slice. Weighting was by the confidence of image classification, raised to an optimized power. Confidence was measured as the number of nearest neighboring training examples that share the majority classification.
Traditional classification criteria for spruce include a mix of manually assessed quantitative and qualitative characters (Table S3). Researchers consistently use five key features: size (white spruce generally having larger grains); angle of the sacci attachment to pollen body (white spruce being more acute); texture of the pollen body (white spruce being finer); the internal reticulate structure of the saccus (white spruce having larger, more circular lumina); and the saccus shape (white spruce having blunter sacci) (Fig. 2; Hansen & Engstrom, 1985; Lindbladh et al., 2002). The features we employed in our study coarsely capture these shape, size, and texture characters, but unlike current methods, can be used irrespective of grain orientation.
Learning system evaluation
System accuracy was measured by repeatedly dividing our image data into training and testing sets (Fig. 1). We formed our classification models on a set of training examples and evaluated performance on a separate, randomized set of testing examples. The learning algorithm used the training examples to form the prediction model and applied it to the test examples while measuring its performance in terms of accuracy, speed, and model size. We repeated the experiment with different random partitions and averaged the results. Without bias optimization (i.e. if we had only run a single iteration of the experiment), this accuracy measure would have been an unbiased predictor of future performance. Since we use bias optimization, we know our accuracy is likely optimistic, or overfit. Overfitting refers to the difference in the system's predicted accuracy and its measured accuracy when the system is applied to new data. The greater the optimization, the greater the overfit. We addressed the problem of overfitting in part by using large training samples and through experimental repetition and performance averaging.