Applicability of Object Detection to Microfossil Research: Implications From Deep Learning Models to Detect Microfossil Fish Teeth and Denticles Using YOLO‐v7

Microfossils of fish teeth and denticles, referred to as ichthyoliths, provide critical information for depositional ages, paleo‐environments, and marine ecosystems, especially in pelagic realms. However, owing to their small size and rarity, it is time‐consuming and difficult to analyze large numbers of ichthyoliths from sediment samples, limiting their use in scientific studies. Here, we propose a method to automatically detect ichthyoliths from microscopic images using a deep learning technique. We applied YOLO‐v7, one of the latest object detection architectures, and trained several models under different conditions. The model trained under appropriate conditions with an original data set achieved an F1 score of 0.87. We then enhanced the data set efficiently using the pre‐trained model. We validated the practical applicability of the model by comparing the number of ichthyoliths detected by the model with those counted manually. This revealed that the best model can predict the number of triangular teeth, denticles and irregularly shaped teeth with minimal human intervention. This object detection method can extend the applicability of deep learning to a wider array of microfossils and has the potential to dramatically increase the spatiotemporal resolution of ichthyolith records for applications across disciplines.


Introduction
Microfossils such as foraminifers, coccolithophores, radiolaria, and diatoms, have been used to constrain depositional ages and environments of various kinds of seafloor sediments, as well as to provide high-resolution and detailed records of evolutionary processes (Armstrong & Brasier, 2005).Among them, microfossil fish teeth and denticles, referred to as ichthyoliths, are composed of calcium phosphate, which is resistant to dissolution on the deep seafloor (Doyle & Riedel, 1985;Sibert et al., 2017).Therefore, ichthyoliths are observed from almost all types of seafloor sediments, including pelagic clay, where other siliceous and calcareous microfossils are rarely observed.Taking advantage of this, ichthyoliths have provided key constraints for depositional ages (Doyle & Riedel, 1979, 1985;Ohta et al., 2020) and marine environments and/or ecosystems (Britten & Sibert, 2020;Sibert et al., 2014Sibert et al., , 2016;;Sibert & Rubin, 2021) especially in pelagic realms.In addition, ichthyoliths preserve a variety of geochemical systems, including strontium and neodymium isotopes, which can provide additional age constraints on sediments (e.g., Gleason et al., 2002;Ingram, 1992) and insights into deep water circulation patterns and origin of sedimentary components (e.g., Huck et al., 2016;Martin & Haley, 2000;Scher & Martin, 2004;Tanaka et al., 2022;Thomas et al., 2014).Oxygen isotopes in ichthyoliths have also been used to reconstruct changes in ocean temperature (e.g., MacLeod et al., 2018).However, traditional observation methods rely on "handpicking," in which an observer picks fossils individually under a stereomicroscope (Ohta et al., 2020;Sibert et al., 2017;Tanaka et al., 2022).This process is time-consuming and can only be conducted by a skilled observer, making it difficult to analyze large numbers of ichthyoliths from various sediment samples.
Computer vision technologies are developing rapidly.In particular, image processing using deep learning has been applied to various fields, including earth science (Hoeser & Kuenzer, 2020;Mimura, Nakamura, Takao, et al., 2023).Automating previous manual observation processes saves time and provides opportunities for discoveries by increasing the number of fossils that can be observed and processed.The application of deep learning techniques for the classification of foraminifers (Hsiang et al., 2019) and radiolarians (Carlsson et al., 2022(Carlsson et al., , 2023;;Itaki, Taira, Kuwamori, Saito, et al., 2020;Tetard et al., 2020), and coccolithophores (Beaufort et al., 2022) is enhancing the resolution in paleoenvironmental studies.These studies detect particles by thresholding and recognize their classes using classification models.However, this method is difficult to directly apply to ichthyoliths because it is sometimes challenging to identify the outline of ichthyoliths by thresholding method (Figure 1).To solve this problem, we have proposed an automated detection of ichthyoliths in microscopic images by combining the object detection model "Mask R-CNN" (He et al., 2020) and image classification model "EfficientNet-V2," both of which are based on deep learning techniques (Mimura et al., 2022).Although the system showed a good performance, two problems remained.First, due to the scarcity of the learning data set, the system could only detect triangular teeth, leaving denticles and saw-toothed ichthyoliths undetected (Figure 1).Second, there was a time loss in the combined system, as a well-trained object detection model can distinguish classes without using the classification model.

Generation of Data Sets
Out of more than 1 million (M) images of the microscopic field of view, 12,219 were selected for "original" data sets.The locations and classes of the ichthyoliths within the images were annotated manually.Ichthyoliths were classified into three classes (Figure 1): triangular tooth (class name: "tooth"), denticle ("denticle"), and forms similar to Rectangular saw-toothed ("sawtoothed").
Two data sets were generated from these images and annotations.The data set "original_selected" comprised 6,945 images with ichthyoliths, and the data set "original_all" comprised 6,945 images with ichthyoliths and 5,274 images without ichthyoliths (Mimura, Nakamura, Yasukawa, et al., 2023).The data sets contained 7,705 triangular teeth, 533 denticles, and 103 saw-toothed shapes.The images and corresponding annotation files were randomly split into three subsets: 80% for training, 10% for validation, and 10% for testing.We note here that images in each subset are the same between the two data sets, except for the image that does not contain ichthyoliths.This enabled us to conduct performance tests on the same data set (i.e., models trained on the training subset of data set original_selected can be tested by the testing subset of the data set original_all).

Tuning of Hyperparameters
We conducted hyperparameter tuning by training the "YOLOv7" model under different initial learning rates ("lr0" in YOLOv7's parameter file) and the final one-cycle learning rates ("lrf").A stochastic gradient descent algorithm with a momentum fixed at 0.937 was applied for training.The image size was fixed at 640 × 640 pixels and the batch size at 8. The models were trained on a local Windows PC with a single graphic board with 16 GB of memory (GeForce RTXTM 3080 Ti, NVIDIA Inc.).

Training Conditions
YOLOv7 provides several models with various numbers of trainable parameters.In this study, we compared five models, "YOLOv7-tiny," "YOLOv7," "YOLOv7-X," "YOLOv7-W6," and "YOLOv7-E6," each having 6.2M, 36.9M,71.3M, 70.4M, and 97.2M parameters, respectively.Training of YOLOv7-tiny and YOLOv7 models was conducted on the local Windows PC, while training of the higher models was conducted on the cloud computing platform "Google Colaboratory" (Carneiro et al., 2018).The image size was basically set to 640 × 640 pixels.However, we also trained YOLOv7-W6 models with a larger image size set to 1,280 × 1,280 pixels, as Wang et al. (2022) proposed for larger models.In all training cases, the batch size was fixed at 8. The models were trained on either the local Windows PC, a local Linux PC with two graphic boards having 24 GB memory (GeForce RTXTM 3090 Ti, NVIDIA Inc.), or Google Colaboratory (see Table 2).Following YOLOv7's online augmentation method, the images were randomly flipped vertically and/or horizontally, and the colors, scales, and shear of the images were randomly changed every time the training images were loaded.

Practical Test
In the data sets described in Section 2.3, more than half of the images contained at least one ichthyolith, whereas only tens to one hundred ichthyoliths are observed from ∼1,000 images in actual observation.We,  therefore, conducted a practical test to evaluate the performance of the trained models under more practical conditions.Three samples at DSDP Site 576, not used in the original data sets described in Section 2.3 or the extended data set described in Section 3.3, were selected for the practical test.The models detected ichthyoliths from the whole field-of-view images (30,826 in total) taken from 28 slides.Since microscopic images were taken with overlap, duplicated detections were excluded by calculating absolute coordinates in the entire slide (Figure 2).The slides were also observed manually under a polarization microscope.We tested the practical applicability of the trained models by comparing the number of ichthyoliths counted by the models with that observed manually.

Hyperparameter Tuning and Iteration Test
F1 scores of YOLOv7 models trained with different hyperparameters on data set "original_all" are presented in Table 1.The initial learning rate of 0.0007 and final one-cycle learning rate of 0.05 were the most suitable conditions in this study.Under the same condition, we then conducted and evaluated five training iterations and observed that one standard error (1 SE) of the F1 score was 0.008 (Table S2 in Supporting Information S1).When comparing the performance of the models in the following discussion, a difference in F1 scores greater than 2 SE (0.016) was considered significant.

Comparison of Performances Under Different Training Conditions
The performance of the models trained on different model sizes and data sets is detailed in #1 to #12 of Table 2.We evaluated the performance of models based on averaged F1 scores of the three classes (macro-F1 score).Comparing the number of parameters (Figure 3a), models with ∼70M trainable parameters (YOLOv7-X, YOLOv7-W6) exhibited the highest F1 score, suggesting that these models are suitable for this study.Comparing the image sizes (Figure 3b), we observed that the models trained with the input image size set at 640 exhibited higher F1 scores than those trained with an image size of 1,280.Although the difference in the data set "selected" is less than 2SE, we suggest that the suitable input image size is 640, as larger input size increases the risk of overfitting (e.g., Sabottke & Spieler, 2020).Finally, comparing the data set type (Figure 3c), the results exhibited a variety of trends.However, following the discussion above, if we focus on the cases with a number of parameters around 70M and input image size at 640, models trained on the data set "all" showed higher F1 scores than those trained on the data set "selected."Thus, we concluded that the suitable training condition in this study is (a) to use models with ∼70M parameters (YOLOv7-X or YOLOv7-W6), (b) to set the input image size at 640, and (c) to train on a data set "all," which is composed of both images containing ichthyoliths and images that do not contain ichthyoliths.

Efficient Production of Training Data Set Using Detection Results
YOLOv7 can output results as text files in the same format as the training labels.Taking advantage of this, we enhanced the sizes of data sets by first predicting a trained model and then checking the result manually.Using the YOLOv7-X model trained on the data set "all" with an image size of 640 (#9 of Table 2), the existence of ichthyoliths was predicted from ∼1,100,000 images generated from the six sites considered in this study.Images from three samples at Site 576 used for the practical test were excluded.We collected 4,463 images in which the model predicted the existence of the class "denticle" or "saw-toothed," which were relatively small compared to the class "tooth."After the manual check of detection results for the 4,463 images, 2,528 images contained ichthyoliths, and 1,935 did not have ichthyoliths; of those containing ichthyoliths, 1,657 teeth, 1,282 denticles, and 108 saw-toothed ichthyoliths were identified.Notably, the "denticle" was more than twice the number in the original data set, and the "saw-toothed" was almost the same as the number in the original data set.As well as the original data sets, images, and annotation information were randomly split into training (80%), validation (10%), and testing (10%) subsets.
The data set "extended_all" was generated by combining the data set collected by the above process and the data set "original_all" (Mimura, Nakamura, Yasukawa, et al., 2023).Considering the discussion in Section 3.2, we  trained the two models, YOLOv7-X and YOLOv7-W6, on the data set "extended_all" with an input image size set at 640.The performances of the trained models are shown in #13 and #14 of Table 2.

Practical Test
We conducted a practical test for the four models: YOLOv7-X trained on the data sets "original_all" (#9 of Table 2) and "extended_all" (#13), YOLOv7-w6 trained on the data sets "original_all" (#10) and "extended_all" (#14).The number of ichthyoliths detected by these models and manually counted are shown in Table S3.We also calculated the root mean square percentage error (RMSPE), using the following equation: where n, ŷi , and y i indicate the number of samples, the predicted ichthyoliths, and the manually observed ichthyoliths, respectively.
Comparing the models trained on the data set "original_all" (#9, #10) and "extended_all" (#13, #14), models trained on "extended_all" showed trends closer to y = x for classes tooth and denticle (Figures 4a and 4b).The high performance of the model trained with the "extended_all" data set may be attributed to the high variation of false patterns in practical conditions.We realized that models trained on the original data set confused various triangular particles or patterns with teeth (Figure S1 in Supporting Information S1).Since the "extended_all" data set contains many images that the preliminary model misdetected, the model trained with this data set is considered to learn false positives efficiently.RMSPEs suggest that using the v7-w6_extended_all model (#14), the number of teeth and denticles from a sample can be estimated with ∼7% and ∼24% error rates, respectively.On the other hand, RMSPEs for the "saw-toothed" class are >70%.Furthermore, no clear trend was observed (Figure 4c), indicating that the number of "saw-toothed" cannot be accurately estimated based solely on the model's detection result.
We also manually checked the images detected by models #13 and #14 and removed false positives and duplications that could not be excluded by the algorithm described in Figure 2.After checking model #13's detection, we observed a trend closer to y = x (Figures 4d-4f), indicating that combining manual review with model #13 is preferable.Model #13, with manual check, achieved an RMSPE of ∼3%, ∼9%, and almost no error for counting the number of teeth, denticles, and saw-toothed ichthyoliths, respectively (Table S3).

Advantages of Object Detection Method Using YOLO-v7
The application of deep learning to microfossil observations has attracted increasing attention recently (Carlsson et al., 2022(Carlsson et al., , 2023;;Hsiang et al., 2019;Itaki, Taira, Kuwamori, Maebayashi, et al., 2020;Marchant et al., 2020;Mitra et al., 2019;Romero et al., 2020;Salonen et al., 2019;Tetard et al., 2020).A commonly used method in particle detection is to apply rule-based thresholding to detect each particle and subsequently classify them using an image classification model.Although these methods require less work to prepare a data set, deep learning- based detection has advantages over traditional methods in finding "challenging" particles.While traditional rulebased thresholding methods struggle to detect particles that overlap, have drastic changes in brightness, or have almost similar brightness to the background (Figure 1) in ichthyolith slides, deep learning-based methods can accurately detect them.Therefore, we propose that object detection would broaden the range of deep learning applications in microfossil studies.
Compared to our previous method (Mimura et al., 2022), which required two steps, object detection by Mask R-CNN and image classification by EfficieneNet-V2, the new method can detect ichthyolith in a single step, which enhances the efficiency of observation.We measured the detection times for processing 10,884 slide images using the two methods on Google Colaboratory.While the previous method required 11,250 s in total, 7,230 s for detection using Mask R-CNN, and 4,020 s for classification using EfficientNet-V2, the new method required only 1,040 s in total process, indicating that the new method is approximately 10 times faster than the previous method.

Implications for Biostratigraphic and Paleoecological Studies Using Ichthyoliths
We expect the new observation method to make the biostratigraphy of ichthyoliths more precise, advancing progress in paleoceanography and resource geology related to pelagic (red) clay.Pelagic clay covers over one- Note.Trained on the data set "extended_all" with an image size of 640.Results after the manual check are also provided.a Pliocene-Quaternary. b Too small #tooth compared to upper/lower horizons.

Earth and Space Science
10.1029/2023EA003122 third of the global ocean (Dutkiewicz et al., 2015) and has huge variation in bulk geochemistry (Dunlea et al., 2015;Mimura et al., 2019).Therefore, pelagic clay is a good recorder of long-term and global/regional environmental changes (Kyte et al., 1993;Tanaka et al., 2022;Yasukawa et al., 2023;Zhou & Kyte, 1992).Moreover, pelagic clay is also attracting attention as a promising resource for rare-earth elements (Kato et al., 2011;Ren et al., 2021;Takaya et al., 2018;Yasukawa et al., 2014).However, the scarcity of microfossils except for ichthyoliths has hampered making precise age models of pelagic clay.Letting machines perform much of the time-consuming observations, substantial amounts of ichthyoliths can be observed, and more accurate age models will be established.This should provide numerous insights into the evolution of pelagic environments from paleoceanographic viewpoints, as well as the ore genesis and potential distributions of the prospective deepsea mineral resource.
We also expect that this tool will improve our understanding in biological and ecological studies.As a demonstration, we show a downhole variation of denticle/tooth (D/T) ratios at DSDP Site 576 in the western North Pacific Ocean (Table 3, Figure 5), which were generated from the detection results of model #13 combined with manual check.D/T ratio is an index for relative ratios of shark and ray-fined fish, an indicator of marine vertebrate community stability (Sibert et al., 2016).By manual counting in a previous study (Sibert et al., 2016), three stages in the D/T ratios from the late Cretaceous to the present were proposed.Cretaceous ocean (i.e., older than 66 Ma) was characterized by high D/T ratios, reflecting a relatively small number of ray-fined fishes compared to the present ocean.Subsequently, Paleogene ocean (from 66 to ∼20 Ma) showed moderate D/T ratios, reflecting the evolution of ray-finned fish after the K/Pg boundary (Sibert & Norris, 2015).Finally, the modern ocean (from ∼20 Ma to the present) is characterized by low D/T ratios, which may reflect an extinction event of sharks in the early Miocene (Sibert & Rubin, 2021) and the consequent predominance of ray-finned fish.In the previous study, the trend was clearly exhibited from the South Pacific (DSDP Site 596), but the evidence from the North Pacific (ODP Site 886) was somewhat limited due to the huge hiatus in the Paleogene (Figure 5).Using our deep learning-based image processing method, we found D/T ratios results that were consistent with the previous study from DSDP Site 576 in the North Pacific site that has continuous Paleogene sedimentation, supporting the pelagic vertebrate community structure proposed in Sibert et al. (2016).While this method is still developing, high throughput data collection provides the opportunity for elucidating the interaction between environmental change and the marine vertebrate community.

Conclusions
In this study, we proposed a new and efficient method for the observation of ichthyoliths, which is approximately 10 times faster than our previous method.Using this method, we expect that studies using ichthyoliths, including biostratigraphy, geochemistry, paleoecology, and the evolution of fishes, will become more precise due to improved sample throughput and identification.Conventional studies on ichthyolith stratigraphy have focused mainly on the presence or absence of each ichthyolith species.In contrast, ratios of the species were hardly considered, possibly due to the enormous amount of manual work required to count the total number of fossils in a discrete sediment sample under a microscope.Since the object detection method is capable of counting the total number of ichthyoliths in a sample, as well as classifying them to a particular type (here, teeth, denticles, or sawtoothed teeth), it can rapidly calculate a ratio of each ichthyolith species within an entire sample slide glass.This tool enables research focusing on quantitative changes in the occurrence of each ichthyolith morphotype, which in turn will provide more accurate depositional ages on pelagic clays, improve geochemical reconstructions, and open the possibilities for high-resolution ecological and evolutionary studies of fish and sharks at significantly increased spatiotemporal resolution.Finally, while we focused here on ichthyoliths, which are understudied compared to other microfossil groups, the automated deep learning methods presented here can be applied broadly to a wide array of microfossil groups, increasing the throughput of data across many fields of study.

Figure 1 .
Figure 1.Examples of ichthyolith images categorized into three classes used in this study.Images of teeth considered challenging to detect under the thresholding-based method but can be detected using object detection models are also shown.

Figure 2 .
Figure 2.An illustration explaining the algorithm for excluding duplicate detections in this study.

Figure 3 .
Figure 3. F1 scores compared by training conditions.The x-axis of each graph represents (a) the number of trainable parameters, (b) image sizes, and (c) the type of data set.Error bars represent ±1 SE.

Figure 4 .
Figure 4. Comparison of the number of ichthyoliths counted manually and those detected by models trained in this study.The black solid lines indicate y = x, which means that the model's detections are identical to manual observations.Plots below and above the y = x line indicate that the model made false negative and false positive errors, respectively.(a-c) Scatter diagram of the number of models' detection and manual count.Regression lines are only indicated in class "tooth" (a), as no clear trend was observed in class "denticle" (b) and class "saw-toothed" (c).(e-f) Comparison between the number of ichthyoliths that a human observer recounted after the best model's detection and the manually counted number.The numbers were compared per slide for teeth, but per sampling horizon for classes denticle and saw-toothed, particles in these slides were contained in only a few slides.Regression lines were obtained using Excel (Microsoft® Excel® for Microsoft 365 MSO, version 2310).

Figure 5 .
Figure 5.A downhole variation of denticle/tooth ratios at Deep Sea Drilling Project (DSDP) Site 576, hole 576B, obtained by the detection model proposed in this study.The age model at Site 576 is based on ichthyolith biostratigraphy (Shipboard Scientific Party, 1985) corrected by Ir anomaly (Kyte et al., 1995).The results of three samples with two small numbers of teeth compared to upper and lower horizons were excluded from the plot.D/T ratios obtained by manual counting at DSDP Site 596 (South Pacific) and Ocean Drilling Program (ODP) Site 886 (North Pacific) are also shown.

Table 1
F1 Scores of the Models Trained on Different Hyperparameters of Initial Learning Rate ("lr0") and FinalOne-Cycle Learning Rate ("lrf") MIMURA ET AL.

Table 2
Performances of the Training With Different Models and Data Sets

Table 3
The Total Count of Ichthyoliths in the Three Classes Detected by