As part of a long-term population study at a site in western Massachusetts, continuous drift fences were in place to monitor marbled salamander breeding migrations at each of 14 seasonal pond basins (see Jenkins, McGarigal & Gamble 2003 for detailed field protocols). Pitfall traps (#10 tin cans) were spaced every 10 m on both sides of each fence to capture immigrating (pre-breeding) and emigrating (post-breeding) adults. Previous analyses showed that drift fences were effective at capturing marbled salamanders with capture probabilities between 0·83 and 0·91 for immigrating adults and between 0·52 and 0·85 for emigrating adults (Gamble et al. 2006).
We checked traps once daily from 24 April to 22 November 2002, and continuously during large breeding migrations (n > 20 individuals per night). Upon capture, adult marbled salamanders were weighed to the nearest 0·25 g, sexed and photographed. To standardize image acquisition and optimize image quality, we designed a weatherproof, open-top light-box with a coloured grid background and a circuit of six white LED bulbs. The bulbs were powered by three rechargeable AA batteries held in an externally mounted casing. Individual salamanders were placed in the box and photographed at ‘medium’ resolution (640 × 480 pixels) with one of two hand-held Sony Mavica digital cameras (Model MVC-FD83). No anaesthetization was required. We recorded image numbers in field books and transferred image files to a central database weekly and after significant migration events. All captured animals were released on the opposite side of the drift fence (i.e. in their direction of movement) after data collection. During the off-season, we closed traps and created numerous openings along fences to allow passage of animals.
The first nuisance variable is pose. Salamanders have flexible bodies and their size and shape can vary due to growth, recent feeding and/or their degree of hydration. In addition, without use of an anaesthetic, it is difficult to get a salamander to maintain a straight pose. We address pose variability by digitally marking the dorsal midline of the salamander in database images with a series of points (identified with mouse-clicks; Fig. 1a) and interpolating between these points. The result is a smooth curve that outlines the shape of the animal's pose. We then re-map this curve to a straight line along the x-direction in a length-preserving manner. To do this, at each pixel along the interpolated curve, we extract a narrow image strip perpendicular to the local tangent direction. We then reassemble this strip on the straightened medial axis. The result of repeating this cut-and-paste operation along the entire length of the curved medial axis is a salamander body that is ‘straightened’, albeit artificially (Fig. 1b).
The second nuisance variable is rotation. We re-orient the ‘straightened’ salamander image so that the head is pointing to the right.
The third nuisance variable is location. Once we have straightened and rotated the salamander body, we need a local coordinate system by which we can refer to all pixels on the body. This coordinate system will allow us to match salamanders in a translation-independent manner. To accomplish this, we digitally mark each image with a set of fiducials, or reference points, where the legs emerge from the body (Fig. 1b). The mean of these fiducials becomes the body-centre of the new coordinate system with its x-axis positioned horizontally and the y-axis vertically.
The fourth nuisance variable is scale. Because we have already centred the animal, scale variation can be handled by normalizing the dimensions of the local coordinate system. We achieve this by using the four fiducial points to define a body-window (the approximate mid-section of the body from which we extract pattern information; Fig. 1b) and rescaling the salamander image using bicubic interpolation so that the body-window has a fixed length and width. This rescaling addresses dimensional changes that may occur in an individual over time (e.g. due to growth or weight change) as well as from other factors relating to the imaging geometry or the animal's pose.
With these steps in place, we have images of salamanders that are straight, with the head pointing right, normalized in length and width, and centred at a fixed location. However, because the manner in which we determine the body-window is based on digital marking, it is not precise. To account for this uncertainty, we perturb each of the four variables defining the body-window (centre coordinates in x-axis and y-axis, length and width) by a random value drawn from a normal distribution with 5 pixels of standard deviation (Fig. 1c). Each statistically perturbed rectangle is called a patch, and the values of its image pixels are resampled using bicubic interpolation. Because the perturbation of length and width of the body-window can produce different dimensions from the original body-window, we scale back the resampled patch to the original body-window dimensions, again using bicubic interpolation.
As a result of these steps, each image i, has a representation
of M patches, where is the patch (in ith image corresponding to the jth perturbation) written as a vector. This vector is obtained by arranging the pixel brightnesses along a raster-scan of the patch. Because all patches are of the same size, each vector is of the same fixed length. With five perturbations each in length, width, x-position and y-position, there are M = 625 patch-vectors for each image.
The fifth nuisance variable is illumination, which can vary among images due to changes in ambient lighting and the surface conditions of the animal. To some degree we minimize this variability with the use of the customized light-box. We further compensate by contrast-normalizing the patch-vectors. In the case of marbled salamanders, we think that much of the useful information is in the brightness variability and not colour. Therefore, we convert the patches to grayscale and apply contrast normalization to the grayscale patches using the formula:
- ( eqn 1)
where µi,j is the average brightness of the patch-vector , and σi,j is the standard-deviation of brightness of the patch-vector. Note that both the average and standard-deviation are scalar quantities. Thus, the modified representation of an image is a set of normalized patch-vectors, . The representation Ri can be interpreted as a matrix whose columns are vectors constructed from the brightness of patches extracted from the images.
The last nuisance variable is specularity. Images of a smooth, wet animal tend to contain numerous spots where light reflection may obstruct the underlying pattern of the animal. The distribution of these spots depends on the animal's position relative to the camera and the strength of the light source, thus motivating the design of the light-box for image acquisition. Our algorithm does not explicitly compensate for remaining specular effects, but deals with them implicitly as the subsequent discussion will show.
outline of the recognition algorithm
Having marginalized the effects of these nuisance variables, we compute the numerical distance between patches using Principal Component Analysis (PCA). This approach has been successful in a variety of visual recognition problems, most notably for face recognition (Turk & Pentland 1991). PCA can be viewed from the perspective of a Mahalanobis distance (Mahalanobis 1936) between two normalized patch-vectors
and corresponding to images m and n, and written as:
- ( eqn 2)
where Lmrns is simply the distance between the rth patch of image m and sth patch of image n. The matrix C−1 is called the cost or information matrix. In the PCA approach, this is computed by (pseudo-)inverting C, which is the covariance of the population of patch-vectors. The analytical population covariance is unknown, and is empirically determined from population samples. A reduced rank approximation of the sample covariance is obtained by selecting few principal components that capture the most dominant modes of variability in the population (See Appendix S1 in Supplementary material for a detailed derivation).
The basic recognition algorithm then proceeds as follows:
Compute the representation R0 of the reference (or ‘query’) image, numbered 0.
For each database image: i = 1 ... N
Load the representation Ri, computed in advance.
Compute the distance L0ris
of each of the s
= 1 ... M
normalized patch-vectors in Ri
normalized patch-vector r
= 1 ... M
using equation 2
. See Appendix S1 for details on how this computation is implemented efficiently.
Assign the minimum distance between elements of Ri and R0 to the score O(i).
Sort the vector of scores O in descending order and return the top Q images to the user.
Simply stated, this algorithm finds the best match between the normalized patch-vectors computed for the reference and database images. Unfortunately, in this current form, its performance results are limited for the individual salamander identification problem (e.g. less than 40% of known matches are retrieved in the top 10 ranks). The primary reason for poor performance is that the image patch, when represented as discussed above, does not sufficiently discriminate between (a) variability arising between differences in patternation between individuals, and (b) variability in pose or illumination of the same individual across different captures. Numerical differences between patches can come from either source, and thus far, there is no explicit mechanism to factor one source from the other.
To improve performance, we use a multi-scale representation. This approach stems from the observation that visual information is contained in a patch at several scales. For example, human visual inspection of a salamander image quickly reveals the presence of both coarse and fine structures (i.e. visually discernible markings such as the edges, lines, shapes and their spatial distribution). We are able to easily match coarse structures in one image with those in another and separate their contributions from similarities between finer structures. The original algorithm has no separate representation for things coarse or fine. Thus, two images of different salamanders differing in their coarse structures may appear numerically more similar than two images of the same salamander with remnant nuisance artifacts at the finer scale. By decomposing the image along this scale dimension, we essentially inspect the image from different perspectives. Measuring numerical similarity or dissimilarity between all of these views allows the algorithm to discriminate more effectively.
To accomplish this, we adopt a formalism called the Gaussian scale-space (Witkin 1983; Lindeberg 1994; Ravela & Manmantha 1999). The idea behind this approach is that if we take a sharp image and blur it, fine-scale structures disappear leaving behind coarse-scale structures. Generating a family of blurred images from a single image produces a multi-scale representation. More formally, starting with an image I0, we generate a sequence of images It (t = 1 ... T), where
- (eqn 3)
evaluating algorithm performance
To test the performance of the pattern algorithm, we manually identified 101 pairs of known matches from the full set of 1008 salamander images. The manual identification of matches for testing was facilitated by comparing small subsets of images (e.g. immigrating captures to emigrating captures at the same basin) where matches were most likely to be found. We included all manually identified matches regardless of apparent image quality. We then imbedded this test set into incrementally larger random sets from 200 to 1008 images, ran the pattern algorithm, and plotted the percentage of known matches identified in the top 5 and top 10 ranks.
Upon completion of testing, we designed a graphic user interface (GUI; Appendix S2) to read the algorithm output files and display each reference image with its top ranked ‘candidate’ matches. A visual review of the top 10 candidate images in the GUI allowed us to definitively confirm or reject potential matches and assign an individual identification number to each set of matched images. There were no cases in which visual confirmation was ambiguous. The individual identification number was then linked back to the original biological field data using the shared field value for the image file name. We estimated the time spent on pre- and post-processing steps as the total amount of active keyboard time devoted to each step (e.g. identifying fiduciary points with mouse-clicks) divided by the total number of images processed.
We used capture histories compiled from the pattern matching process to quantify the total number of captures per individual and their locations (i.e. recaptured at the same pond basin as original capture or at a different basin), as well as (1) the timing of arrival at pond basins, (2) the duration of stay at pond basins, and (3) weight-change during the breeding period, calculated as a percentage of ‘pre-breeding’ weight (wet mass measured upon first capture). We calculated these three variables only for individuals captured twice – once during immigration and once during emigration. We then used scatter plots to identify possible correlations between each pair of the three variables, both at the pond-level (for ponds with n > 10 individuals) and the full study area level (pooling individuals across ponds). Because pond-level observations were generally consistent with study area-wide observations, we reported only the pooled results for purposes of this analysis. Note, our data represented a nearly exhaustive sample of breeding salamanders at all pond basins in a limited geographical area. For this reason, our inference space in this application was technically limited to our study area and inferential analyses were not appropriate.