Our spot-pattern matching technique was applied in database ‘scans’: as new whale shark photographs were submitted to the ECOCEAN Library, spot data were extracted and compared with patterns from all previously submitted images, separately for left and right flanks. A list of candidate image matches was produced by the algorithm and ranked according to the computed score.
A subset of the Library entries, or ‘encounters’, represents multiple images of the same shark. As described below, these instances proved useful in estimating the method's self-consistency: if encounter A matches encounter B, and B matches encounter C, the technique should also provide a match when A is compared directly with C.
correct matches: the method’s efficacy
To explore the method's success rate and any potential difficulties, spot patterns for each of 21 previously identified (i.e. matched by eye) left-side images were scanned across all other available left-side spot data sets. As of 1 December 2004, there were 271 such data sets. Similarly, six known right-side images were compared within the catalogue of 181 right-side data sets. In the vast majority of cases, comparisons involving different sharks produced a zero score; in some cases, however, a small non-zero score resulted. We refer to the latter as false-positive matches. When the same shark was imaged in both encounters, a high score typically resulted, an outcome we refer to as a correct match. Occasionally, comparison of two same-shark images produced a low score, or a failed match. Figure 6 summarizes the results of these tests. The distributions of vote totals V, matched triangle fractions fT, and product scores S resulting from comparison of the 27 previously identified pairs of encounters are shown in green. For the same set of comparisons, all false-positive match scores reported by the algorithm were accumulated; the resulting distribution is shown in red. A reliable method for identifying unique patterns should minimize the overlap in the red and green histograms. We find that for vote totals V (top panel of Fig. 6) the distribution of false positives is broad and encroaches, at the high end, on the vote totals garnered by the correct matches. An essential discriminator appears, however, in the triangle fraction (middle panel of Fig. 6): when fT is restricted to values greater than 5%, the number of false-positive matches drops from 236 to 11, while just three correct matches are also flagged, two of which had, in any case, the lowest vote totals V. In the bottom panel (Fig. 6), the product score S incorporates the additional information contained in fT. We find that the distribution of S for false-positive matches is well described by a log-normal that drops off rapidly for S > 10.
Figure 6. Quantitative measures of match quality provided by the pattern-matching algorithm: vote total (top panel), fraction of triangles contributing votes (middle panel) and their product, our preferred ranking criterion (bottom panel). Right- and left-side trials have been combined. Distributions for correct (green) and false-positive (red) matches among previously identified images are shown, as well as those for new ‘blind’ matches (black) made by our algorithm. Trials resulting in zero votes are shown in the leftmost bin of the top and bottom panels. The mean (log S = −0·22) and standard deviation (σlog s = 0·61) of the false-positive scores are represented by a Gaussian curve in the bottom panel (blue). Hatched regions reflect trials in which fewer than 5% of triangles contributed to the vote total. A qualitative assessment of matches is suggested by the empirical scoring thresholds shown in the bottom panel, with weak, moderate and strong candidates corresponding to high, medium and low probability, respectively, of a false positive.
Download figure to PowerPoint
The available sample of previously matched pairs of encounters is small but, we believe, representative of the underlying statistical properties of correct and false-positive match scores. The results shown in Fig. 6 therefore suggest an empirical scheme for classifying the quality of a pattern match as scored by our algorithm.
A non-zero score S less than 10 is unlikely to represent a true match, but rather is characteristic of a false positive.
A score between 10 and 100, especially with a fraction fT greater than 5%, represents a moderately strong likelihood that the two patterns under comparison are truly matched.
Any score above 100 represents a strong candidate for a correctly matched pair of spot-pattern images. The log-normal distribution of false-positive scores places the S = 100 boundary 3·6 standard deviations above the mean: this implies a formal probability of chance occurrence in this high-confidence category of better than 1 in 6000.
Based on these criteria, we can estimate a success rate for the method. From among the 27 previously identified pairs of encounters tested, 21 produced scores in the strong match category, another four in the moderately strong category, none were reported as weak candidates and two failed to match altogether. We combine the two higher-confidence categories to derive a success rate of 25 out of 27, or 92%. Although based on a small sample, this rate is encouraging and may well improve with time, as photographers mindful of the requirements of our technique strive to improve their vantage points in obtaining new photographs of whale sharks, as discussed below.
failed matches and false positives: difficulties encountered in applying the method
The performance of pattern-matching techniques is subject to factors beyond the control of any numerical algorithm; our triangle-matching method is no exception. The difficulties that present themselves can be grouped into three categories: image quality, viewing geometry and spot pattern systematics.
Spot extraction from raw whale shark images can be complicated by lighting conditions, shadows, obscuration of spots by other fish, granularity of low-resolution images and other phenomena. Nevertheless, we find the triangle-matching algorithm to be effective even when two images have fewer than half of their spots in common, so that most of these difficulties are overcome simply by careful editing of photographs.
The direction from which shark flank images are obtained is important. In photographs obtained from directions anterior or posterior to the centre of the measurement region, foreshortening alters the aspect ratio of the spot pattern, changing the geometries of the derived triangles. Similarly, a camera vantage point too far dorsally or ventrally displaced produces altered geometries. The algorithm's ɛ uncertainty parameter can compensate, in part, for these distortions, and our implementation further mitigates perspective effects by imposing an upper limit on the size of triangles relative to the image dimensions. Nevertheless, an oblique image was responsible for one of the two instances in Fig. 6 in which a previously known match failed to produce a high score. We have experimented with numerical correction of spot patterns foreshortened along the anterior–posterior line by trigonometrically adjusting the spacings of spot x-coordinates immediately following extraction; although dependent on the operator's estimate of the angle formed between the image plane and the shark's flank, this technique holds some promise. We note that as the database of encounters grows, the collection of images for a given shark will span a range of perspectives, improving the odds that a successful identification will be made. As demonstrated in Fig. 7, photographs obtained from extreme forward or tailward angles will not be correctly matched with each other (simulations suggest that successful matches can be made for viewing perspectives different by up to 30°), but each will match other images made at intermediate angles. In the long term, therefore, oblique images of frequently encountered sharks will have minimal impact on the method's ability to provide a reliable identification.
As described earlier, whale shark spots sometimes fall along neatly arrayed arcs. Occasionally, spots are found to lie, within each arc, at quasi-regular intervals, so that they form a loose grid. When one image in a comparison pair exhibits such a gridded pattern, our algorithm can produce a relatively high score even when the images correspond to different sharks. Spots arrayed in grids account for the highest scores (S ≈ 20) we have found among the false positives, and generally also produce fT > 0·05. Moreover, gridded patterns can be responsible for failed matches, this is the case for the remaining failed match from our previously identified test data set, because falsely matched similar triangles from the two images overwhelm those that are correctly matched.
Where our algorithm fails to establish a strong match, visual inspection or some other method is needed to identify the imaged shark. False-positive outcomes are undesirable, but even in cases where the algorithm cannot provide an unambiguous identification, it reduces dramatically (by a factor of between 10 and 100) the number of images that a user need examine visually to uncover a successful match.
‘blind’matches: the method’s successes
To date, 111 image pairs not previously known to be associated have been matched by our algorithm and, of these, 96 had scores S > 10. Typically, database scans produced a list of candidate matches, the most highly ranked of which were examined visually for spot-pattern compatibility and unrelated identification markers such as scars. Confirmed matches were noted and tabulated, resulting in the black histograms of score distributions shown in Fig. 6. As expected, most of the successful matches have scores in the high-confidence range, with decreasing numbers in the moderate- and low-confidence ranges. We note that not all of the blind matches constitute new identifications: in cases where three or more encounters were available for a single shark, all possible image pairs, for example three pairs for the shark shown in Fig. 7, were included in the category of blind matches, forming a rough self-consistency test of the method. The high-scoring fraction of 96/111 = 86% among blind matches provides supporting evidence for the method's efficacy. We emphasize that these results have been obtained with a data set that is not prejudiced against moderately oblique images; it reflects, in other words, a collection of encounter photographs that were acquired under real-world conditions.