An open‐source general purpose machine learning framework for individual animal re‐identification using few‐shot learning

Animal re‐identification remains a challenging problem due to the cost of tagging systems and the difficulty of permanently attaching a physical marker to some animals, such as sea stars. Due to these challenges, photo identification is a good fit to solve this problem whether evaluated by humans or through machine learning. Accurate machine learning methods are an improvement over manual identification as they are capable of evaluating a large number of images automatically and recent advances have reduced the need for large training datasets. This study aimed to create an accurate, robust, general purpose machine learning framework for individual animal re‐identification using images both from publicly available data as well as two groups of sea stars of different species under human care. Open‐source code was provided to accelerate work in this space. Images of two species of sea star (Asterias rubens and Anthenea australiae) were taken using a consumer‐grade smartphone camera and used as original datasets to train a machine learning model to re‐identify an individual animal using few examples. The model's performance was evaluated on these original sea star datasets which contained between 39–54 individuals and 983–1204 images, as well as using six publicly available re‐identification datasets for tigers, beef cattle noses, chimpanzee faces, zebras, giraffes and ringed seals ranging between 45–2056 individuals and 829–6770 images. Using time aware‐splits, which are a data splitting technique ensuring that the model only sees an individual's images from a previous collection event during training to avoid information leaking, the model achieved high (>99%) individual re‐identification mean average precision for the top prediction (mAP@1) for the two species of sea stars. The re‐identification mAP@1 for the mammalian datasets was more variable, ranging from 83% to >99%. However, this model outperformed published state‐of‐the‐art re‐identification results for the publicly available datasets. The reported approach for animal re‐identification is generalizable, with the same machine learning framework achieving good performance in two distinct species of sea stars with different physical attributes, as well as seven different mammalian species. This demonstrates that this methodology can be applied to nearly any species where individual re‐identification is required. This study presents a precise, practical, non‐invasive approach to animal re‐identification using only basic image collection methods.


| INTRODUC TI ON
Re-identifying individuals is important for animals in managed care as well as wildlife research; however, it represents a surprisingly challenging feat in many species and scenarios.In zoos, individual animal identification is needed to track where the animal came from, reproductive history, medical records and evaluate lifespan (Reuther, 1968); while in agricultural settings, the ability to identify individuals is also important in traceability during disease outbreaks (Bowling et al., 2008;Murphy et al., 2008).Individual re-identification of free-ranging animals is necessary to facilitate research in many areas of ecology, evolutionary biology and conservation; including information on life history, population dynamics and social structure (Clutton-Brock & Sheldon, 2010;Pradel, 1996).
Tagging sea stars for individual re-identification has proven to be an exceptionally challenging problem.Passive integrated transponder devices (PIT tags), one of the most common methods of tagging animals (Gibbons & Andrews, 2004), are poorly retained in sea stars (Olsen et al., 2015), and currently available alternatives are either invasive or temporary.Genotyping methods can be used in individual re-identification of animals but these methods are expensive, labour-intensive, and require handling of animals to obtain samples (Taberlet & Luikart, 1999;Weller et al., 2006).Both freeranging sea stars and those in managed care may need to be individually identified.Predatory sea stars are important members of their ecosystems, and have been recognized as keystone species for their role in maintaining biodiversity and structuring aquatic ecosystems (Paine, 1969;Saier, 2001).The loss of large numbers of sea stars in mass mortality events has resulted in trophic cascades (Schultz et al., 2016), underscoring the need to have a better understanding of basic life history traits in these important species.Sea stars are used in research and are popular display animals in public aquaria.
Sea stars in managed care are often group-housed, and the inability to individually identify them precludes the ability to have medical and husbandry records at the individual level.In addition, captive breeding and reintroduction programs for the critically endangered sunflower star (Pycnopodia helianthoides) (Hodin et al., 2021) will benefit from the ability to monitor the success of released individuals.
The majority of published sea star tagging methods are invasive and may have detrimental effects on individual fitness, as well as potential impacts on research results due to the response to trauma and behavioural changes.Published methods include electronic tags affixed by piercing the arm with a wire (Lamare et al., 2009), branding with a soldering iron, injection of visible implant elastomer (Martinez et al., 2013), scratching a number into the body wall with a sharp pencil (Scheibling, 1980), sewing a plastic label with thread through the body wall (Savy, 1987) and attaching acoustic tags by threading monofilament line through the depth of the arm (Chim & Tan, 2013).Non-invasive methods have included vital staining and surface tags, but these methods are temporary.Vital staining techniques with Nile Blue Sulfate have involved immersion of the entire animal (Feder, 1955;Loosanoff, 1937), specific arms (Barahona & Navarrete, 2010) or use of a marking pen (Kvalvågnaes, 1972).
Before the recent developments in machine learning methods, individual animal re-identification was performed using a number of computer vision methods which required the development of bespoke algorithms to target the distinguishing features of the individual animals such as zebra stripes (Lahiri et al., 2011), bear faces (Clapham et al., 2022), whale shark skin patterns and whale flukes (Berger-Wolf et al., 2017).In computer vision techniques, two or more images of the same individual are compared using a feature matching algorithm such as Scale Invariant Feature Transform (Lowe, 1999).Newer, more robust methods use convolutional neural networks (i.e. machine learning) which require significantly less preprocessing of images and are easier to adapt across different species (Schneider et al., 2018).In addition to incremental improvements in neural network-based architectures due to affordable access to more efficient and better performing hardware, these new methods have been made more robust against small datasets due to transfer learning (Shaha & Pawar, 2018).The final and most recent leap in performance has come from the development of methodologies specifically designed to improve performance in the individual reidentification domain, such as the triplet loss function (Hermans et al., 2017).and can be coupled with transfer learning (Shaha & Pawar, 2018) to take advantage of pretrained neural networks capable of extracting key features from images (Orenstein & Beijbom, 2017) for general purpose image classification tasks.To use FSL methods, diverse images for each individual, such as those from multiple angles and lighting conditions, need to be used in model training.

Machine learning methods
One of the key developments in machine learning that made FSL methods attractive for the individual re-identification problem domain was the development of loss functions specifically designed to work with small amounts of data (Parnami & Lee, 2022), such as a loss function known as triplet loss.A loss function is a metric that measures the error between the model's prediction and the ground truth data.Images from the same individual will be mapped to embeddings that have a small Euclidean distance between them, while different individuals will be mapped to embeddings with a longer Euclidean distance between them; the computed distance corresponds to the loss function which facilitates the re-identification of the same individual.This process is known as triplet loss learning when three examples are used (Le Cacheux et al., 2019;Nepovinnykh et al., 2020;Schneider et al., 2020).In animal re-identification, the objective for the model is to learn a dense (i.e.compressed) representation of images such that, given an image of an individual (anchor), its dense representation will be closer to the representation of another image of the same individual (positive example) than that of a different individual (negative example); as measured by the Euclidean distance in a high-dimensional space.
The extensive resources invested in the machine learningpowered human face re-identification domain (Schroff et al., 2015) can be leveraged for animal re-identification due to recent developments which improve the usability of those technologies.While machine learning methods have many advantages, there have been no fully automated machine learning models published for sea star reidentification.This may be due to difficulties in implementing crucial components of triplet loss learning which are sometimes deemed as "complicated" (Yao et al., 2020), such as the triplet selection criteria, also known as triplet mining.There are also logistical challenges related to the large number of images traditionally required for training models, in addition to the limited existence of publicly available datasets.Furthermore, published models for use in other species have not made their code freely available (Li et al., 2022;Nepovinnykh et al., 2020;Schneider et al., 2020) which poses a challenge in reproducing results and making direct comparisons to other machine learning frameworks.
The individual animal re-identification problem domain can be categorized into several different varieties depending on how the problem is framed, similarly to the person re-identification problem domain (Bedagkar-Gala & Shah, 2014).On one hand, the closed-set problem refers to the re-identification of individuals when the pool of possible choices is fixed, and no new individuals can enter.This is typically the case with captive animals under human care, but it can also happen with free-ranging animal populations that have an extremely limited number of individuals.On the other hand, the open-set problem refers to the re-identification of individuals when the pool of possible choices is not known (i.e.any individual may have never been seen before) (Scheirer et al., 2013).This is a more common occurrence in wildlife conservation and ecology scenarios, such as with images collected using camera traps.Although both problems require distinct evaluation metrics, much of the underlying technology used to solve them can be the same.
The objectives of this study were to (1) create an accurate, robust, general-purpose machine learning framework for individual re-identification of sea stars and other animals using images and (2) make the code open source to accelerate work in this space and to facilitate the creation of applications.

| Ethics statement
Sea stars are not currently covered by research oversight guidelines in the United States or Australia and thus no animal ethics committee review was performed.Sea stars were handled and housed in accordance with guidelines for aquatic species published in the Guide for the Care and Use of Laboratory Animals, 8th edition (National Research Council, 2010).Sea stars were maintained in natural seawater systems; and every effort was made to minimize stress on individuals.During data collection, sea stars were always maintained under the water.

| Data collection
The machine learning model was initially developed using images of  In both datasets, a small number of images were later discarded due to image quality issues, such as occlusions or blurry subjects.
The total number of images left in the ASRU dataset was 1204 across 39 individuals and five collection events.The processed ANAU dataset contained 983 images across 56 individuals and three collection events.

| Data splitting
For the sea star datasets, all images, 1204 for the ASRU dataset and 983 for the ANAU dataset, were separated into a training set and a testing set such that images from every individual were present in both sets.The training and testing sets were produced using time-aware data splitting, which is demonstrated to be significantly less likely to lead to performance overestimation compared to time-unaware (random) data splitting (Papafitsoros et al., 2022) as illustrated in Figure 5.Using time-aware splitting, images were divided based on collection events; so, groups of images collected at the same event were all part of the same set.This was used in the closed-set variation of the evaluation metrics.
To further prevent overestimation, the model was evaluated using a separate time-aware split for each available collection event, and the average of the individual splits is reported.The ASRU dataset consists of five collection events, separated into five different splits leaving one collection event out as the testing set and using all others for the training set.The ANAU dataset consists of three collection events, which were separated into two splits.The first split only used the last two collection events, where the individuals were placed in plastic containers, as the training and testing sets respectively.The second split combined the last two collection events as the training set, and used the first collection event, where the individuals were in the naturalistic environment, as the testing set.
This was intended to simulate a catch-and-release scenario, where an individual might be taken from its natural environment for sample collection and later released and photographed back in its natural environment.
For the publicly available datasets, randomized time-unaware splitting (see Figure 5) was performed in order to make the results

| Machine learning framework
The machine learning model described here uses the DenseNet (Huang et al., 2017) neural network architecture, specifically the DenseNet121 available in the Keras software (Chollet, 2015).This architecture was chosen given its reported performance in the individual re-identification problem domain (Schneider et al., 2020).
The final classification layer was replaced with a densely connected embedding layer consisting of 128 units; this is called the embedding examples, with a minimum margin (α).If the value is negative, the output of the loss function is set to 0, One of the most challenging aspects of this approach is efficient sampling implementation to draw an anchor, positive, and negative example in each iteration; this is known as triplet mining.Triplets were drawn in the following manner: First, a batch of samples consisting of image and label pairs were drawn.Then, for each element (i.e.image) in the batch (anchor), a pair of samples within the batch with the same (positive) and different (negative) labels that met the following conditions were identified: 1.The loss function (Figure 6) produced a positive value.
2. The negative sample was farther from the anchor than the positive sample, within margin α.This is known as semi-hard triplet mining (Hermans et al., 2017), which is one of several published triplet mining strategies (Chechik et al., 2010).However, this methodology poses another problem, which is the requirement for all elements in a batch to have the presence of a positive and negative example within the same batch.To overcome this, the batches were created as follows: 1.An element was drawn from the dataset using random order.
2. A second element with the same label was selected at random from the dataset.
3. Steps 1 and 2 were repeated until the desired batch size was achieved.
4. If the batch only contained elements of a single label, the process was restarted.

| Model evaluation
The metric used to evaluate the model was the mean average precision (mAP) for the top N predictions (mAP@N).Results are reported for the top 1 (mAP@1) and top 5 (mAP@5) predictions, as has been previously reported for machine learning photo identification (Nepovinnykh et al., 2020;Schneider et al., 2020).1.An individual animal's identification number (label L) was selected to be evaluated.
2. An image from the same individual (L) that had not been previously seen by the model during training was selected and its embedding vector (V) was computed.
3. Using Euclidean distance, the closest 10 embedding vectors to V from the training set each cast a vote for their label.
4. If one of the N most voted labels (N = 1 for mAP@1 and N = 5 for mAP@5) was the same as L, it was considered an accurate identification.
5. The process was repeated for all other images from the same individual (L).
6.The average precision was computed as the count of accurate identifications divided by the total number of possible identifications.
7. The mAP was computed as the mean of the average precision across all individuals.
Model evaluation using the closed-set evaluation metric was only performed for both sea star datasets (ASRU and ANAU).
To compare our methodology with existing results, performance against existing, publicly available datasets was evaluated using randomized, time-unaware splits and the open-set evaluation metric as described in previous research (Dlamini & van Zyl, 2021).Using the open-set evaluation metric, the model is evaluated using only previously unseen individuals and the mAP@N is calculated based on whether one of the nearest N neighbours belong to the same individual.Data processing was replicated as described in each respective publication; when provided, pre-processed images were used.
These publicly available datasets range in size from 820 to 6770 images and 45 to 2056 individuals (Table 2).These databases covered seven different mammalian species.The Amur Tiger Reidentification in the Wild database (Li et al., 2019) (Parham et al., 2017).These images have a maximum dimension of 3000 pixels.Images of the pelage pattern from Saimaa ringed seals (Pusa hispida saimensis) that are 160 × 160 pixels are included in the Ringed Seal Image dataset (SealID) (Nepovinnykh et al., 2022).The StripeSpotter database (Lahiri et al., 2011) contains full-body images including both flanks of plains zebra (E.quagga) and Grévy's zebra (Equus grevyi).

| RE SULTS
For the ASRU and ANAU datasets and all respective splits, a mean average precision of over 99% using the closed-set evaluation metric was achieved.The mAP@1 shows that the top prediction in testing was accurate and ranged from 0.9945 to 0.9992.This indicates that in 1000 image evaluations, the model will correctly identify the individual 995 to 999 times.In terms of individual re-identification, at most one image from a single individual was misclassified in some of the experiments, leading to the high mAP@1 metric.The mAP@5 indicates that one of the top five identifications from the model was the accurate identification and ranged from 0.9973 to 0.9992.Both of these metrics indicated reliable re-identification of the same individual across collection events.These results are summarized in Table 1.Even after using principal component analysis (PCA) to reduce the dimensionality of embedding vectors from 128 to only 2 dimensions, the clustering of individuals is readily apparent (Figure 7).
The performance of the model using the open-set evaluation metric and randomized time-unaware splitting when tested against these publicly available datasets was more variable (Table 2).The mAP@1 ranged from 0.8385 to 0.9974.For each dataset, the mAP@1 was higher than the reported corresponding state-of-theart results which ranged from 0.746 to 0.987.The mAP@5 was 1.0 for the Amur tiger, beef cattle noseprint, and StripeSpotter datasets, which indicates perfect recall.The mAP@5 was 0.9527 for the chimpanzee dataset, 0.9682 for the Great Zebra and Giraffe Count and ID dataset, and 0.9139 for the ringed seal dataset.
Examples of the quality of the images in the publicly available datasets are represented in Figures 8 and 9. See Figure 8 for the impact of image quality on model performance.When the model was tested on an image with an occluded subject, the identification performance decreased.The lack of diversity in pose, lighting, and background in datasets is illustrated in Figure 9.
TA B L E 1 Mean average precision (mAP) metrics and dataset split summary for the sea star datasets.Note: Mean average precision was calculated for the top choice (mAP@1) and the top five choices of the model (mAP@5).The ANAU dataset contains images of Anthenea australiae sea stars, the ASRU dataset contains images of Asterias rubens sea stars.For the ASRU dataset, the mAP is presented with standard deviation due to the presence of five separate splits in the training process.
F I G U R E 7 Dimensionality-reduced embedding vectors using principal component analysis.The dense vector representation was compressed further into two components (x, y) using principal component analysis.Each point represents a single image and each colour on the plot represents a different individual sea star.Seven individuals from the ASRU dataset previously unseen by the model are included in this figure for illustration purposes; see Table 1 for the number of images and individuals used during training and testing.
The optimal choice of hyper-parameters was partly dependent on the specific dataset, highlighting the need for hyper-parameter tuning in order to achieve maximum performance as seen in Table 3.
The effects of individual hyper-parameters across all datasets are summarized in Table 4. Larger batch sizes did not have a positive effect on performance.Regularization and image augmentation methods were also counter-productive in most cases, but caution must be applied when extrapolating those results due to the limited number of epochs used for training during the hyper-parameter selection process.The choice of triplet loss margin value did not have a large impact with less than 0.02 difference for mAP@1.Finally, a larger embedding size as well as the number of layers in the base model whose weights were updated were correlated with greater performance.
TA B L E 2 Mean average precision (mAP) metrics and dataset split summary for existing publicly available animal re-identification datasets.

Dataset (citation)
Dataset size mAP@1 Reported mAP@1 (citation) Amur Tiger Re-identification in the Wild (S.Li et al., 2019) 1887 Images (92 individuals) 0.9974 0.889 (Dlamini & van Zyl, 2021) Beef Cattle Muzzle/Noseprint Database (Xiong et al., 2021) 4923 Images (268 individuals) 0.9937 0.987 (Xiong et al., 2021) Chimpanzee Faces (Freytag et al., 2016) 6770 Images (78 individuals) 0.8385 0.797 (Dlamini & van Zyl, 2021) Great Zebra and Giraffe Count and ID (Parham et al., 2017) 4948 These results represent an improvement in the accuracy of previously published animal re-identification work, particularly for sea stars (Glynn, 1982).A photo recognition program and code has been developed to identify individual knobby stars (Protoreaster nodosus) by coloration and tubercles (Chim & Tan, 2012).However, this method was only applicable to a single species and required manual processing of each image into a coding system that was still not very reliable with a 23% error rate in the first test.Methodologies for animal re-identification for multiple different species using similarity learning networks have been previously described (Dlamini & van Zyl, 2021;Miele et al., 2020;Schneider et al., 2022), however, none of these publications made their code publicly available.With an mAP@1 greater than 99% in multiple datasets with different species of sea stars and over 83% in all other species, the methodology described herein advances the field of animal re-identification by improving state-of-the-art results and providing open-source code that can help reproduce these results and accelerate the creation of applications using this technology.In most cases, expert human evaluation can be used to distinguish between individuals; however, this can be time-intensive and can be challenging when the individual markers between animals are small or hard to visualize, as is the case when using sea star spines for individual animal identification.Even in these cases where individual re-identification is extremely challenging to humans, the reported methodology performs with high accuracy provided the images taken are under a controlled environment.This method is also less expensive and more practical than   The ANAU (Split 2) dataset attempted to simulate an independent, natural environment collection event without a significant decrease in performance, but time-aware splitting was not available for datasets with larger numbers of individuals.Since images of free- ranging individuals (open-set problem) are often only available from a single collection event, typically in the form of frames extracted from a single video, the performance metrics reported in this work might not translate to real-world results.The lack of diversity of poses and backgrounds is readily apparent in some cases, as seen in the two individuals demonstrated in Figure 9.
Furthermore, a model trained using this methodology can be employed in a number of ways with different applications imposing different constraints and requirements.For example, a fully automatic re-identification system might be desirable for low-risk scenarios such as the feeding of a population of individuals under human care.
On the other hand, tracking of free-ranging populations could use a hybrid approach and present researchers with the most likely individuals but let the final determination be hand-picked by the user.
The metrics used to determine classification of images can also be subject to fine-tuning to meet specific application requirements; for example, a voting mechanism could be implemented across multiple images if they were taken consecutively and there is certainty that the images belong to the same individual.
Our goal is that by making our source code available and demonstrating state-of-the-art results, practical solutions and products can be developed that will result in improvements in animal management and care, population research in situ, and animal welfare.
We encourage other authors to make their code freely available to facilitate work in this space and allow for reproduction of results.
Individual re-identification improves the ability to provide and record individualized care for animals, including monitoring feeding, behaviour, and veterinary care.A non-invasive method for animal re-identification is especially advantageous in aquatic invertebrate species which have been historically challenging to permanently tag.
Non-invasive tagging methods also represent an improvement in animal welfare.Previously used invasive methods of tagging animals will likely lose their social licence as standards of care and legal protections increase, particularly with the rising interest in invertebrate animal welfare (Horvath et al., 2013;Mather, 2019;Perkins, 2021).
are an ideal tool for animal reidentification since they have the capability to evaluate a large number of images without the input of human operators once the model has been trained.Machine learning refers to computer-based algorithms that use many examples to derive a relationship between the given examples and their corresponding labels without being mammalian species.This demonstrates that this methodology can be applied to nearly any species where individual re-identification is required.This study presents a precise, practical, non-invasive approach to animal re-identification using only basic image collection methods.K E Y W O R D S animal re-identification, artificial intelligence, few-shot learning, machine learning, sea star, starfish explicitly programmed for it.During the training phase, the algorithm is shown both the examples and their labels.During the testing phase, it is only shown the examples and the derived labels are evaluated against the ground truth (i.e. the real labels).One common issue in machine learning applications is the need for a large training corpus of data.Few-shot learning (FSL) methods require very few examples of a given individual animal to capture the required information to produce accurate identification predictions for unlabeled images.FSL methods were specifically developed to overcome the often small amount of training data available in real-world scenarios,

adult
North Atlantic common sea stars (Asterias rubens) [ASRU dataset].Common sea stars (n = 39) were individually housed in 53 L tanks and had no visible external lesions.Each of the 39 individual common sea stars were removed from their tank and placed in a foodgrade clear plastic container with 2 L of water from their home-tank.Images were taken with common sea stars in the plastic containers with a solid white background under the same lighting conditions in a controlled laboratory environment.A minimum of seven images from five different angles were taken of each individual on each of five separate collection events (different days) using an iPhone camera (version 8 Plus or SE 2020; Apple, Cupertino, California, United States).See Figure 1 for example images from common sea stars.

Figure 2
Figure 2 demonstrates intra-and inter-individual variability of the common sea stars.To determine if the methodology was transferable to a different species of sea star, images were taken of adult Australian cushion stars (Anthenea australiae) [ANAU dataset].Cushion stars (n = 54) were group housed in a naturalistic touch tank exhibit.
described herein comparable with the existing literature(Dlamini & van Zyl, 2021;Schneider et al., 2020).Time-unaware splitting randomly splits images into the training and test sets using an 80/20 F I G U R E 3 Sample images for the ANAU dataset from a single Anthenea australis sea star.The first image (a) was from the simulated natural environment.The second image (b) was a close-up of the central disk.All other images were taken consecutively from different angles.split: 80% of images in the training set, and 20% in the testing set.The training set was used during model training, while the testing set was used to evaluate the model's performance on unseen data.This was used in the open-set variation of the evaluation metrics.
vector, conceptually equivalent to the dense representation of the image.The size of the embedding vector was chosen after an initial exploratory analysis, showing little improvement when larger embedding vectors were used when all other variables remained equal.Each of the artificial neurons within the neural network have an associated weight which is assigned as part of weight initialization, updated during training, and is used to determine the final classification during inference time (i.e.testing).A model with the same architecture, pretrained on the ImageNet dataset(Deng et al., 2009), was used for the purpose of weight initialization; in a technique called transfer learning.Transfer learning reduces the need to train models on a large dataset, by pretraining the data on a similar task with a different dataset(Torrey & Shavlik, 2010).By using transfer learning, the pretrained weights of a neural network are used as the starting values for weight initialization.During training, a number of layers of the base model were updated, between 64 and all 427 (the exact count was determined by a choice of hyper-parameter).Classification of individuals in the test set was done after the training phase using the nearest neighbours of the embedding vector.The nearest neighbours were defined as the most similar embedding vectors using Euclidean distance.In the closed-set variation, when the model was shown an image of an unknown individual, votes were cast for the ten nearest neighbours and the model generated a ranked list of potential matches.In the open-set variation, the number of nearest neighbours was dependent on the evaluation metric used, such that the single nearest neighbour was used for mAP@1 and the five nearest neighbours were used for mAP@5.Once the model is trained, new individuals can be re-identified without the need for additional training since the nearest neighbours may correspond to any individual, seen or unseen.The loss function used during training was triplet loss(Hermans et al., 2017).The loss function (L) computes the difference of squares of the distances from the anchor (A) to positive (P) and negative (N)

F
I G U R E 4 Example of intra-and inter-individual variability for Anthenea australis sea stars.Images of sea stars with the same letter represent the same individual.

F
Illustration of data splitting for model training and testing.(a) represents time-aware data splitting.(b) represents timeunaware (random) data splitting.This guaranteed that, for every element in the batch, there was at least one corresponding positive and negative example within the same batch.Batches were selected until all elements were drawn in random order, which completed a training epoch.A maximum of 50 epochs were used during training but were stopped early when the training evaluation metrics achieved more than five consecutive epochs without a decrease in the average loss computed on the training set.None of the experiments reached the maximum number of epochs.Given the small number of total images available for training, multiple techniques were employed to reduce the risk of overfitting the model to the training data.First, a number of image augmentation procedures were applied to samples from the training set as they were drawn into batches.Image augmentation is used to simulate a higher number of images than what was included in the original dataset by applying pseudo-random transformations to the images.The number of augmentations was controlled by an augmentation count parameter, which determined how many image augmentations were randomly chosen to be applied to an individual image.The image augmentations applied included pixel dropout, Gaussian noise, horizontal flip, translation, rotation, cropping or zoom-out, and changes in hue and saturation.The augmentation count parameter was set between 2 and 8, meaning each image had at least two but less than eight transformations applied each time it was seen during training.Another technique used to reduce overfitting was dropout regularization (Wager et al., 2013) which is using a dropout factor parameter to control the probability of a connection between the base model and the final embedding layer being dropped.Both the image augmentations and the dropout regularization were applied only during training, and not when evaluating the model against the test set.To find an optimal model, a series of different hyper-parameters were changed across trials.These hyper-parameters included batch size, dropout regularization rate, count of image augmentation transformations, image augmentation factor, triplet loss margin value, number of layers in the base model whose weights were updated, and size of the embedding.Due to computational constraints, not all combinations of hyper-parameters were exhaustively tested; instead, for each dataset, a random sample of 100 different hyperparameters was used to train a model for 10 epochs.Then, the bestperforming set of hyper-parameters for each dataset was used to train a separate model for 50 epochs across multiple trials using a different selection of training and testing sets-5 for the ASRU dataset, 3 for all others.
The embedding vectors correspond to the output of the model after going through the training process.The mAP evaluation metric was computed differently for the open-set and the closed-set variations.The closed-set evaluation metric was designed to determine the model's ability to re-identify a previously seen individual given a previously unseen image of that individual at a new collection event.It was computed as follows:

F
Illustration of the training objective derived from the triplet loss function.The anchor and positive samples are two different images from the same individual selected from a single training batch, the negative sample is an image from a different individual selected from the same training batch.The anchor, positive, and negative samples form a triplet which is used in the loss function by minimizing the distance between the anchor and positive samples while maximizing the difference between the anchor and negative samples.
contains images of the full-body of Amur tigers (Panthera tigris altaica) with visible stripes extracted from 1080p resolution video from trap cameras.Images within the Beef Cattle Muzzle/Noseprint Database (Xiong et al., 2021) are close-up images of the noseprint of feedlot beef cattle at 26 megapixel maximum resolution.The Chimpanzee Faces database (Freytag et al., 2016) is a combination of two datasets: the C-Zoo dataset and the C-Tai dataset and contains images that are centred around the cropped face of chimpanzee (Pan troglodytes) faces with an unspecified resolution.Full-body photographs of plains zebra (Equus quagga) and Masai giraffe (Giraffa camelopardalis tippelskirchi) taken by 55 total citizen scientist photographers are in the The Great Zebra and Giraffe Count and ID database Mean average precision was calculated for the top choice (mAP@1).F I G U R E 8Representative images from the Chimpanzee Faces dataset(Freytag et al., 2016) demonstrating model performance with differing image quality.The leftmost column contains images from the test subset never seen during training.The remaining images in the same row contain the five nearest neighbours in order of Euclidean distance.☒ Indicates that the image does not correspond to the same individual.☑ Indicates that the image does correspond to the same individual.(a) None of the nearest neighbours correspond to the correct individual.(b) Although the closest neighbour does not correspond to the correct individual, two of the five nearest neighbours correspond to the correct individual.(c) All five nearest neighbours correspond to the correct individual.4 | DISCUSS ION The proposed methodology demonstrates state-of-the-art results in the animal re-identification domain, with accurate results given multiple populations and across two different species of sea stars (Asterias rubens and Anthenea australiae).While this research used sea stars to create and test the model, this methodology was accurate when applied to the re-identification of seven mammalian species and required no changes in the approach.The additional datasets evaluated had diverse properties, including varying distance to the subject, image resolution, and identifying features of the individuals.This indicates that the described framework is versatile and can be used in closed populations of animals in managed care settings (closed-set problem) as well as open populations of free ranging wildlife (open-set problem).

F I G U R E 9
Representative images from the Amur Tiger Re-identification in the Wild dataset(Li et al., 2019) demonstrating a lack of diversity in pose, lighting, and background.(a) Entire set of images available for one individual.(b) Entire set of images available for a different individual.
other forms of individual re-identification, such as electronic tags.Determining what markers are used by the machine learning model for individual re-identification is a challenging task.Future work can reach an approximation by studying which components of the underlying neural network are activated when the relevant pixels of the image change, or by computing saliency maps(Simonyan et al., 2014).However, definitive determination of the markers used by the model is unlikely to be necessary to effectively use the model for animal re-identification.Due to the lack of other naturally occurring distinguishable features in A. rubens, such as colour or shape, it is reasonable to assume the model used the spine pattern on the central disc as the distinguishing feature.Other known methodologies, including computer vision and human labelling, prove intractable when using the spine pattern as the distinguishing feature.TA B L E 3 Optimal choice of hyper-parameters for each individual dataset selected after training a model for 10 epochs and identifying the model with the highest mAP@1 using unseen individuals (open set evaluation) or unseen collection events (closed set evaluation).
With a sufficiently diverse set of pictures to use as training data that include a variety of poses, lighting conditions, and backgrounds, a model trained using the methodology described herein is able to re-identify even unseen individuals without the need for re-training the model as seen in the open-set evaluation metrics.This is achievable due to the inherent FSL properties(Parnami & Lee, 2022) of the methodology and will be especially useful in re-identification of free-ranging animals.The results of machine learning models for animal reidentification will be highly dependent on access to high-quality imagery.Model performance can vary based on the properties of the datasets used in training and testing including number of images per individual, resolution of images, lighting conditions, diversity of poses and backgrounds, distinctness of individuals (i.e.identifying features), and overall quality of the images.Image quality likely played a role in the variable performance of the model with the publicly available datasets; however, this cannot be definitively determined without additional performance testing.The described methodology has proven to work remarkably well with images taken under a controlled environment, with relatively good lighting conditions, and at close distance to the subject.While images taken with a smartphone camera were used for the sea star dataset images, the model is expected to perform similarly with high quality images taken with an underwater camera or other consumer-grade cameras.

Dataset (citation) Batch size Dropout Augmentation count Augmentation factor Loss margin Embedding size Retrain layer count
Effect of individual hyper-parameter values as measured by the difference in mAP@1 for the average across all trials compared to the trials for which the specific hyper-parameter value was chosen.