Applications of deep convolutional neural networks to predict length, circumference, and weight from mostly dewatered images of fish

Abstract Simple biometric data of fish aid fishery management tasks such as monitoring the structure of fish populations and regulating recreational harvest. While these data are foundational to fishery research and management, the collection of length and weight data through physical handling of the fish is challenging as it is time consuming for personnel and can be stressful for the fish. Recent advances in imaging technology and machine learning now offer alternatives for capturing biometric data. To investigate the potential of deep convolutional neural networks to predict biometric data, several regressors were trained and evaluated on data stemming from the FishL™ Recognition System and manual measurements of length, girth, and weight. The dataset consisted of 694 fish from 22 different species common to Laurentian Great Lakes. Even with such a diverse dataset and variety of presentations by the fish, the regressors proved to be robust and achieved competitive mean percent errors in the range of 5.5 to 7.6% for length and girth on an evaluation dataset. Potential applications of this work could increase the efficiency and accuracy of routine survey work by fishery professionals and provide a means for longer‐term automated collection of fish biometric data.

simple biometric data, managers can determine a fish population's structure (i.e., the number in each age or size group) and thus the potential for commercial and recreational opportunities. Harvest regulations are often based on fish length so knowing a population's size structure allows managers to track changes in response to harvest regulation (Anderson & Neuman, 1996). Weight data enable calculation of production potential (e.g., kg per hectare) for natural systems (Schaefer, 1965) and if used together with length data can provide estimates of fish health (Bolger & Connolly, Feb., 1989).

| Existing approaches to obtain biometric data of fish
Conventional length and weight data collection requires physical handling of fish which is time consuming for personnel and stressful for the fish. Measurements are commonly taken in the field where conditions can be suboptimal for ensuring precision and accuracy.
Something as simple as wind, fish bouncing, or differences in measuring techniques among personnel can impact the accuracy of measurements and introduce variability (Gutreuter & Krzoska, 1994). In addition, variability of the fish may introduce error when regression formulas are used to calculate data post hoc. For example, when weight data are unavailable from the field, species-specific length-weight regression formulas are used to generate weight data (Gerow, Anderson-Sprecher, & Hubert, 2005;Murphy, Brown, & Springer, 1990). Unfortunately, morphological variability both among fish and seasonally for individual fish can further reduce the precision of calculated weight estimates (Adams, Leaf, Wu, & Hernandez, 2018;Neumann & Murphy, 1992;Ranney, 2018). Finally, the time and effort required to obtain length and weight measurements in the field imposes limitations on the number of fish that can be sampled; which reduces the confidence in data capturing individual variability (Gutreuter & Krzoska, 1994). For these reasons, a tool or method to increase sampling capacity that allows for measurements on more individual fishes that is standardized, reduces variability in measurements, and captures additional information to calculate weight beyond a simple length-weight relationships would benefit both fish managers and researchers alike.
Various approaches have been tested to automate estimation of fish length and weight, and their use in marine science is likely to increase (MaldeApr. , Handegard, Eikvil, & Salberg, 2019).
Ibrahim and Sultana describe image processing such as skeletonization, boundary detection, and machine learning techniques (e.g., Support Vector Machine (SVM), fuzzy classification; Ibrahim & Sultana, 2006). White et al. employed image processing algorithms to determine the orientation of a fish and classify as a flatfish or roundfish (White, Svellingen, & Strachan, 2006). The average percent error for these methods generally ranged from 2% to 5%, although many of the methods were species-specific or involved some manual intervention. Approaches combining optical systems with machine learning report similar error ranges for species-specific length and weight estimates (Saberioon, Gholizadeh, Cisar, Pautsina, & Urban, 2017). More recently, regional convolutional neural networks (R-CNN) were used to bound European sea bass in images from a variety of settings, allowing length to be calculated from known-size fiducial markers (Monkman, Hyder, Kaiser, & Vidal, 2019). Masked R-CNNs successfully located heads of European hake in images from which head size and thus subsequent overall fish length was estimated (Álvarez-Ellacuría, Palmer, Catalán, & Lisani, 2020). Finally, large-segmentation CNNs were able to successfully predict the weight of harvested Asian sea bass and barramundi (Konovalov, Saleh, Efremova, & Domingos, 2019).
The use of CNNs for fish biometric data collection is relatively novel yet with technology advancements can be accomplished with little expense (i.e., using commonly available hardware) and under variable conditions (e.g., camera, setting).
Presented here is a deep machine learning approach to predict the length, girth, and weight for multiple species of fish from low-resolution, dewatered images. Specifically, the regressors were trained and evaluated on a dataset of images for 22 different species common in the Laurentian Great Lakes collected from live fish moving past an image capture device. The goal for this proof-of-concept project was to determine whether a deep convolutional neural network (DCNN) could calculate biometric data (length, weight, girth) from fish with limited handling and without a species-specific classifier.
Fish were held in livecars instream until processed and then immediately released after images were collected. In the case of silver and bighead carp, the fish were collected as part of an invasive species capture and removal effort and were dead during image collection and measurement. Sea lamprey were collected as part of the Sea Lamprey Control Program assessment operations in tributaries to northern Lake Huron and housed at Hammond Bay Biological Station for other research projects. Fish were identified to species when possible, total length measured to the nearest mm using a 1 m measuring board and weighed to the nearest gram using an electronic scale (MyWeigh model KD-8000 max weight 8 kg; precision 1 g) or spring scale for fish greater than 8 kg. Girth was measured by wrapping a segment of static net twine around the deepest point of the body, marking where the twine overlapped, and then measuring the marked twine to the nearest mm using the measuring board. After measurements were taken, fish were then passed once through the FishL™ Recognition System (https://www.whoos hh.com/scann ing-sorti ng#OurCo mpone nts-Scanning). Fish were introduced by hand, headfirst, into the imaging system and allowed to slide through on a stream of water as images were automatically captured in less than 0.5 s. The imaging system consisted of an illuminated ramp with six overhead cameras positioned at a fixed distance from the slide. Three cameras captured near-infrared (IR) images, and three captured color images. Two cameras (one IR and one color) were positioned at three fixed-angle locations (directly overhead, 45° to left, 45° to right). All images were stored in the portable network graphics (PNG) file format. All recorded biometric data were digitized and then validated against the images by two independent recorders. Table 1 describes key terms and variables used in this manuscript.

| Data preparation
Three pictures were taken in sequence by each of the six cameras as the fish slid through the system generating a composite image file linking the 18 high-resolution images (see Figure 2).
To increase the number of usable fish images for training and testing, each composite image was broken into single-individual images. This was done with a custom shell script and the ImageMagick suite (The ImageMagick Development Team, 2020). As each image was extracted, it was rescaled to 75 pixels by 200 pixels, thereby reducing the size of the input into the regressors. Individual images were extracted from the composites resulting in 6,246 individual images, of which 639 (10%) were randomly selected as the test images. The remaining 5,607 images (90%) were used as training and validation images (Table 2). These training and validation images were segmented into 10-folds to support a 10-fold cross-validation training procedure (def. Table 1). When splitting the data between datasets (and folds), care was taken to ensure that all images stemming from the same composite image were placed in the same dataset (or fold). The test dataset was not used for evaluation until the models had been finalized. Figures 4-6 show the frequency distribution of lengths, girths, and weights, respectively, of all the Name Comment

10-fold cross-validation
With 10-fold cross-validation, the dataset is randomly split into 10 equal subsets. One subset is held out as the validation data, and a model is trained on the other 9 subsets. This procedure is repeated 10 times to evaluate each fold, and the results of each iteration are combined as the final estimate of performance.
composite image The image produced by the FishL TM Recognition System. The composite image includes 18 images taken from 6 different cameras as a fish passes through the system. The composite image contains 9 images from color capture cameras and 9 images from near-infrared (IR) cameras.
ensemble prediction A prediction for length, girth, or weight of a fish that was obtained by individually passing the 9 color images of a composite image into a regressor and then averaging the output of the regressor for each image.
multi-target regressor A regressor that simultaneously predicts the length, girth, and weight of a fish from a single image.
single image One of the 9 images in a composite image that was taken with one of the color capture cameras.
single-model prediction A prediction that is made using only one single image.
single-target regressor A regressor that only predicts one of length, girth, or weight.

TA B L E 1 Description of terms used in this article
fish in the training dataset along with the distributions of the five most commonly represented species.

| Regressor construction
To predict the length, girth, and weight of a fish, several regres-

| Training procedures
Each single-target regressor (i.e., length, girth, weight) was trained using the same procedure. The input was a single image, and the target was the length, girth, or weight of the fish contained in the image. Training took place over 125 epochs with a batch size of 32.
The Adam optimizer (Kingma & Ba, 2014) for Keras (Chollet, 2015) was used to minimize the mean squared error of the target measurement. To increase the effective size of the training data, an image augmentation process was applied (Chollet, 2017) and each training image was randomly rotated (0-15 degrees) or shifted vertically or horizontally (0%-20%). A horizontal flip of the image was also applied on a random basis. No augmentation was performed on the validation nor test images. Additionally, a multi-target regressor was trained and evaluated. The multi-target regressor simultaneously predicted the length, girth, and weight of a fish. The Adam optimizer (Kingma & Ba, 2014) for Keras (Chollet, 2015) was used to minimize the average of the mean squared errors across the three measurements. Training of the multi-target regressor made use of the aforementioned image augmentation process and was done over it has been shown that multi-target regressors tend to produce more robust and generalizable models (Collobert & Weston, 2008;Deng & Yu, 2012;Girshick, 2015;Ruder, 2017).

| Ensemble predictions
To leverage the full extent of the data available for each fish, an ensemble prediction was made from the nine color images of the composite taken of each fish as it passed through the scanning device of the nine images was passed through a regressor and the output was averaged and taken as the final prediction. For the multi-target regressor, the averages were taken over the respective targets (i.e., length, girth, and weight).

| Evaluation metrics
Mean absolute error (MAE), mean bias error (MBE), and mean percent absolute error (MPAE) were the three metrics used to evaluate the performance of the regressors. The mean absolute error is defined as the mean of the absolute error between predictions and their respective ground-truth values. More spe- ∕n for a dataset with n images. f(x i ) is the predicted value for the ith image and y i is the true value. The the variation of the ground-truth values, the MPAE provides a more robust measure of performance across metrics (e.g., an absolute error of 0.5 cm would represent a percent error of 10% for a fish measuring 5 cm in length but that same absolute error would only represent a percent error of 2% for a fish measuring 25 cm in length).

| RE SULTS
The first set of models evaluated were single-target regressors that predicted the length, girth, or weight of a fish. On the test data, the ensemble predictions from the single-target regressors performed comparably to the single-model predictions across all three types of measurements in terms of MPAE (i.e., 8.3%-7.6% for length, 17.3%-16.8% for girth, and 28.6%-26.9% for weight; Table 3).
The second set of models evaluated were multi-target regressors.
These models were trained to predict the length, girth, and weight of a fish (i.e., for one input three separate values were predicted). When comparing the results of the multi-target regressors to the single-target regressors on the test dataset, the MBE is less for the multi-target regressors than single-target regressors across all three measurement types (i.e., 30.9 mm to 21.6 mm for length, 36.6 mm to 11.53 mm for girth, and 0.2 kg to 0.1 kg for weight; Table 4). The MPAE and MAE are also less for the multi-target regressors than the single-target regressors across the length and girth outcomes of the test dataset. The trend lines drawn through the plotted multi-target regressor data points relative to actual measured values visibly highlight the differences in the regressor predictions outcomes with a range of values wherein the predictions were tight and a range across which regressor biases were evident (Figures 8-10).The regressor tends to overpredict length for longer fish and underpredict weight for heavier fish. Comparing

the girth and weight MAE and MPAE versus actual measured values
shows a much tighter clustering of the data points in the girth plots indicating overall reduced error in the multi-target regressor predictions for girth relative to weight (Figures 11 and 12). Body shape and size of the different fish species did not influence the performance of the multi-target regressor at predicting girth. However, for weight the

MAE of a portion of Quillback Suckers and Silver Bighead Carps was
slightly higher. All three of these species share a similar body shape.
Sea Lamprey, a light weight tubular-shaped species exhibited the most significant divergence of MPAE of predicted weight distribution which was replicated to a lesser degree by the Common White Sucker, also a somewhat tubular-shaped species.

| D ISCUSS I ON
It is difficult to draw direct comparisons with the prior work reviewed by Ibrahim & Sultana (2006) and Saberioon et al. (2017) as those approaches were evaluated on at most a few species. Still, the results of our single-target ensemble regressors for the predicted length and girth on the cross-validation dataset are comparable to the species-specific models. The ground-truth values for length, girth, and weight were taken in the field and likely exhibit some inherent variability.
The diversity of the dataset in terms of species likely added to the difficulty in predicting the weight of the fish from the images.

TA B L E 3 Performance of single-target regressors on validation and test datasets
The length of the fish directly correlates to pixels occupied by the fish in the image. For girth and weight, the relationship between pixels occupied and the girth may vary depending on the species of fish (e.g., the side profile of two fish may be similar in size in the image but the weight may differ depending on the common cross-sectional shape of a species of fish). An additional species-specific challenge for weight is the fins. The relationship between the surface area of a fish in an image and its weight will depend on the percentage of the surface area that relates to fins (e.g., images of two fish may occupy the same surface area but if the surface area covered by fins is less in one image, then the weight of one fish may be greater than the other). Additional data or providing species information would likely improve a regressors' performance for girth and weight.

| Potential use cases for predicted biometrics
An automated approach to collecting length and weight data would allow fisheries professionals more time to devote to catching fish and processing data. An automated tool could be incorporated into routine survey work in which personnel collect and pass fish through an image capture device to either store images for later analysis or even process images in real time for infield prediction of metrics and indices. Image data for this study were collected as part of routine fishery management assessment efforts. By far, the most time-consuming part of the data collection was measuring and weighting the fish, whereas image capture required considerably less time than the manual data collection. Beyond routine assessment work, image capture and analysis tools could be incorporated into scenarios where  (Garavelli et al., 2019). Automated image capture and processing in this context would provide managers an accurate assessment of the species being passed and corresponding sizes and conditions without the need for personnel on-site, handling every fish for weeks or even months. Real-time assessment of length and girth at passage scenarios could also be used to sort fish based on size (Garavelli et al., 2019).

| Robustness of predictions across presentations
An advantage of the ensemble prediction approach is that it leverages several images of a fish when making a prediction, reducing the likelihood of a poor prediction due to a poor presentation of the fish.
Using an example of three fish from the test dataset ( Figure 13

| Limitations
While these results are promising, they should be interpreted with caution. As the histograms indicate, most of the data used for the construction and evaluation of biometric predictions stem from adult fish and therefore do not constitute the full range of lengths, weights, and girths possible for the included species. A more uniformly distributed dataset across species and age would provide additional support for the generalizability of the regressors. Still, bias to adult fish is not unique to this project and is a function of standard F I G U R E 1 0 Multi-target regressor ensemble predictions for weight versus measured weight on the test dataset F I G U R E 11 Absolute error of multitarget ensemble predictions for girth (a) and predicted weight (b) on the test dataset. The predictions for species with 5 or more samples in the test dataset are plotted against the measured girth and weight, respectively F I G U R E 1 2 Absolute percent error of multi-target ensemble predictions for girth (a) and predicted weight (b) on the test dataset. The predictions for species with 5 or more samples in the test dataset are plotted against the measured girth and weight, respectively sampling protocol that typically emphasizes adults (i.e., mesh sizes; Pope & Willis, 1996).
The current approach to use all the images for a fish as input to a regressor is limited in that it does not make use of the order in which the images of the fish were captured. At present, each image is processed individually, irrespective of the other eight images or the position of the camera. If all nine images were used as input, there may be information that can be leveraged from the camera angle or the movement between frames (i.e., recall that the nine images are taken as a series of three images and as a result capture some movement patterns). In this particular work, the limited amount of data did not support the development of such multi-image regressors. Doing so would have reduced the amount of training and evaluation data by a factor of nine.
The generalizability of the regressors is likely limited in settings beyond the capture technology used in this work. The mostly dewatered and lateral presentation of the fish provides a high-quality input to the DCNN. The uniform background reduces the complexity of the regression task, and multiple images captured by the scanning device reduce the error introduced by poor presentations.
In addition to aiding with the quality of the input, the scanning device also ensures a constant distance between the camera and the fish. The models presented were not architected to accommodate variable distances between the camera and the fish and are unlikely to generalize to other distances. Nevertheless, these results demonstrate what is possible with currently available data capture technology.

| Additional directions of study
An additional line of investigation could be the effect of species information as an additional input to the regressors. While such information is not always available, there are situations when it is (e.g., during targeted species collection, visual inspection by personnel handling the fish) and using this extra information may provide for more precise predictions. Using the data available in this study, additional regressors were trained that used species information as an additional input (data not reported). The models exhibited a large amount of overfitting (i.e., the models did not generalize well to the data in the test dataset). With additional training data, the value of species information could be further investigated.

| CON CLUS ION
Presented here is an overview and evaluation of a set of novel regressors to predict length, girth, and weight of fish from images of mostly dewatered fish.

CO N FLI C T O F I NTE R E S T S
The manuscript describes application of deep learning on images which were captured by a new technology, the FishL™Recognition

DATA AVA I L A B I L I T Y S TAT E M E N T
The extracted images and measurement data are available on OSF