A challenge of deep‐learning‐based object detection for hair follicle dataset

Deep‐learning object detection has been applied in various industries, including healthcare, to address hair loss.


| INTRODUC TI ON
In recent years, progress in the field of object detection has promoted the development of various industries. Its widespread application can improve the efficiency of industries such as Mobile Robot Navigation, 1 Real-Time Abandoned Baggage Detection, 2 and the detection of pedestrians wearing masks in public places. 3 Most object detection algorithms are designed to allow computers to perceive the environment like humans, making them highly sensitive to the images that the human eye sees on a daily basis. These algorithms often use standard image datasets for evaluation in academic research, such as COCO 4 and PASCAL VOC, 5 which are images captured by the human eye in the environment. However, for some specialized image datasets, there are fewer cases of application. In these datasets, the identification of targets is not as regular as in standard datasets, and some even require a long reaction time for correct identification by the human eye. This presents a challenge for the application of object detection algorithms.
For example, an image of human hair follicles must be zoomed in and taken with a specific camera to assess the health of the scalp in a specific area. In cosmetic dermatology, hair follicle detection is currently used to evaluate the health of hair follicles in patients with hair loss. Hair follicle detection is an essential part of the hair loss treatment process because it allows dermatologists to determine the best treatment method and assess the treatment effect after treatment is completed. A dermatologist or other medical professional may use various techniques to detect hair follicles, including visual inspection, dermoscopy, and biopsy. To improve the efficiency of visual inspection, the use of deep learning in target detection by computers allows dermatologists to quickly diagnose the health status of a patient's hair follicles and provide appropriate treatment plans in a timely manner.
In scalp photographs, different hairs can grow from the same follicle, and some hairs of different colors have their own unique hair follicles. Therefore, not only finding the location of the hair follicle is a challenge, but also identifying the different types of hair follicles. In 2018, there was a competition to locate the position of hair follicles, and a team used a method to enhance the accuracy of detection. 6 However, there is less research on the detection of hair follicle classes. This paper proposes a new dataset of hair follicles to which deep-learning-based object detection can be applied. The dataset is marked based on special key points selected on the scalp, and different model hyperparameters are set to better train the object detection model on a unique hair follicle dataset. Our main contributions are as follows: 1. Create a specialized dataset of hair follicles and introduce standardized labeling methods.
2. Apply a deep-learning-based object detection algorithm to the hair follicle dataset.
3. Compare the performances of different object detection algorithms on this dataset.
In Section 2, the paper will describe the development of object detection algorithms and explain why the You Only Look Once (YOLO) 7 algorithm was chosen. Section 3 will introduce the hair follicle dataset and analyze its features and challenges in terms of object detection, as well as describe the normalized labeling method for this dataset. In Section 4, the frequently used evaluation index in object detection will be introduced. In Section 5, we will compare the performance of the YOLOv5 8 object detection algorithm under different hyperparameters, and also show the results of experiments on other algorithms.

| REL ATED WORK
In the past work, a great number of object detection algorithms have been proposed, including Region-CNN (RCNN), 9 Fast-RCNN, 10 Faster-RCNN, 11 and YOLO. 7 Girshick et al. proposed RCNN in 2014, whose performance has been significantly promoted on the VOC2007 12 dataset, and the mean Average Precision (mAP) has been greatly increased from 33.7% (DPM-v5) 13 25 proposed RetinaNet, which uses FPN as the backbone network. They proposed a new loss function: focal loss, which can reach 39.1 AP and 5 FPS speed on the COCO dataset.
The YOLOv1 algorithm was proposed by Redmon et al. 7 On the VOC2007 dataset, compared with Faster-RCNN, an enhanced version of mAP is lower than YOLOv1 but achieves a greater improvement in speed. Lu et al. 26 used YOLOv1 for aerial image vehicle detection. Soon Redmon et al. 27 proposed YOLOv2. Its mAP is almost equivalent to Fast R-CNN, and it reduces the occupancy rate of GPU resources. It has 78.6% mAP in the VOC2007 dataset. Not only is the mAP higher, but it also takes into account the real-time speed. Then Redmon et al. 28 proposed YOLOv3. On the COCO dataset, 4 the mAP of YOLOv3 is equivalent to SSD, 21 29 and YOLOv3, which ensures the same speed and improves the accuracy of the detector. 30 Liu et al. 31 improved YOLOv3 and applied it to detect tomato diseases and pests. Song et al. 32 used the loss function following the same detection way as YOLOv3 and proposed the multifeature information-assisted oneshot detection method to improve the accuracy of one-shot object detection.
Although their accuracy is close to each other, YOLOv5 has higher efficiency, which reduces the time complexity of the calculation process. YOLOv5 provides a variety of scale models, which can adapt to the size of the dataset and select the appropriate model scale to avoid underfitting and overfitting.

| THE HAIR FOLLI CLE DATA S E T
The Hair Follicle Dataset has been designed and collected to combine traditional scalp evaluation with cutting-edge image recognition technology. Additionally, object detection algorithms used in related industries can make scalp evaluation more efficient and cost-effective.
It is also an unusual image dataset for evaluating object detection algorithms in diverse domains because it consists of close-up views of the human scalp, which differs from datasets like COCO and Pascal VOC, whose images can be seen in daily life.
The hair follicle dataset was collected using a special-use camera, which takes pictures of the scalp surface at a fixed size and high magnification. Samples were taken from volunteers of different ages and regions between the ages of 15 and 62 at different locations on their scalp, and the sample size for a single volunteer's scalp did not exceed 10 pictures in order to maintain diversity in the dataset. The dataset contains a total of 591 pictures. Figure 1 shows some samples from the hair follicle dataset.
During data preprocessing, certain images were found to be blurred or out of focus, making them difficult to recognize. These images were removed from the dataset. The remaining images were labeled using LabelImg, 36 with the resulting bounding boxes shown in Figure 2. The annotated images were then randomly split into a training set (75%) and validation set (25%). The final annotated image dataset contained a total of 5422 bounding boxes. Figure 3 shows examples of labeling for various types of labels. Both single-class object detection and multiclass object detection were conducted on the hair follicle dataset using the same dataset. In the single-class object detection task, all classes were merged into the same class for training. This simplifies the detection of hair follicles and helps to study the prediction of the bounding box.
We identified the location of these hair follicles and classified them into five classes as shown in Figure 3.   Figure 4 shows the number of all samples in a frequency distribution histogram, with the xaxis representing the different labels we have marked. The meaning of each label corresponding to the type of hair follicle is shown in Figure 3.
We implemented some data augmentation using Mosaic 33 data augmentation, which randomly stitches and crops four pictures together (as shown in Figure 5) to produce various images for training the model. This improves the model's ability to capture image features. However, it should be noted that traditional sampling methods cannot solve the problem of sample imbalance. Simply duplicating an image that contains both normal and rare samples will also double the number of other samples, failing to alleviate the sample imbalance. After studying the performance of YOLO on the hair follicle dataset, we found that the imbalance problem did not significantly affect performance.

| E VA LUATI O N CRITERI A
In this project, we use mAP, Recall, and Precision as main evaluation metrics to assess model performance. mAP is a widely used evaluation metric in object detection research and is adopted in

| IMPLEMENTATI ON ME THOD AND E XPERIMENTAL RE SULTS
We deployed the YOLOv5 environment and used a Tesla V100-SXM2-32GB for training. It took about 30 h to complete all experiments. The hair follicle dataset is divided into two parts: a 75% training set and a 25% validation set. In YOLOv5, we used Mosaic data augmentation. During the training process, different batch sizes and models are compared to achieve the best model performance,

F I G U R E 6
In the single-class object detection experiment using the YOLOv5s model, the mAP line chart for different batch sizes is shown. The chart illustrates that the red line (Batch size of 2) has the fastest convergence speed and achieves the highest mAP at 39 epochs. The yellow (Batch size of 32) and green (Batch size of 16) lines are the most unstable in their convergence process, with the larger batch sizes resulting in more severe oscillations in the mAP. Additionally, larger batch sizes also have a slower convergence speed. After 80 epochs, all training has stabilized, and the mAP is highest for the red line, which was also the easiest to converge. After convergence, the yellow line (Batch size of 32) has the lowest mAP and was the most difficult to converge.
with a larger epoch to observe the model convergence process.
YOLOv5 provides a single category detection mode. Therefore, it is possible to use different types of hair follicles to carry out singleclass and multi classes object detection.
The YOLOv5s model with a batch size of 4 was used as a training benchmark. The optimal mAP value was observed to be achieved in the 50th epoch of the benchmark experiment, so the epoch was set to 100 for fair comparison in the following experiments.
It is known that larger batch sizes can lead to better results when training on standard image datasets, while smaller batch sizes help to reduce memory usage. Mosaic data augmentation can be particularly useful in strengthening the dataset to better fit small batch sizes.
YOLOv5 has four different sizes of models, each with an increasing number of parameters: YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. A model with more parameters is able to learn more complex features, but it also requires more time for training and testing.
The choice of model is a trade-off between accuracy and efficiency, so we compare the accuracy and efficiency of these different models.
First, the single-class detection problem is tackled using YOLOv5. A control experiment is designed in which the batch size is adjusted while keeping the other hyperparameters unchanged. Through this experiment, it is found that a smaller batch size is more conducive to training better models, as observed through the mAP. The YOLOv5s model is used for training, as it has the fewest parameters and therefore requires the least amount of memory and provides faster training speed.

F I G U R E 7
When the batch size is set to 4, the change in mAP during the training process using different YOLOv5 models is shown. The colors red, green, yellow, and purple represent four YOLOv5 models. The corresponding YOLOv5 models for each color are given in the legend. The purple model (YOLOv5s) has the smallest mAP in the first 40 epochs, but it reaches the highest mAP after 40 epochs. The red model (YOLOv5x) and the green model (YOLOv5l) have the fastest convergence speeds, but their stability and mAP height are not as good as the other two YOLOv5 models with smaller parameters during the convergence process.
F I G U R E 8 mAP line chart of different batch sizes in the YOLOv5l model during the training process of multiclass object detection. Red, blue, purple, green, and yellow are used to represent 5 batch sizes from 2 to 32 in the figure. The batch sizes corresponding to each color are given in the legend. The red (Batch Size 2) mAP has the fastest convergence speed, but its mAP size is not the largest. The yellow (Batch Size 32) and green (Batch Size 16) mAPs are also the most unstable in the convergence process. The larger the batch size, the slower the convergence speed. In the experiment, the mAP of the blue line (Batch Size 4) is the highest and the most stable in the convergence process. After the convergence is complete, yellow (batch size 32) also has the lowest mAP and is the most difficult to converge.
The small size of the hair follicle dataset makes it unlikely that a small model will cause underfitting. As shown in Figure 6, the maximum mAP is achieved when the batch size is set to 2. This may be because a small batch size setting allows for the generation of more batches during training with a small dataset.
We conducted an experiment in which we kept all parameters constant except for the YOLOv5 model. The results showed that models with more parameters did not outperform those with fewer parameters.
We conducted an experiment to compare the performance of different YOLOv5 models for object detection on the small hair follicle dataset. The results, shown in Figure 7, indicate that the YOLOv5s model performs better than the other three models. This is likely due to the small size of the training dataset, which may benefit from the use of the smallest YOLOv5s model.
The performance of the YOLOv5l model was the best among all models in the multiclass object detection task. According to

F I G U R E 9
When the batch size is set to 4, the change in mean average precision (mAP) during the training process using different YOLOv5 models is shown. Among the models, YOLOv5l achieved the highest mAP at 42 epochs, while YOLOv5x achieved a similar high mAP at 74 epochs. It is important to note that models with more parameters may exhibit greater variations in mAP during the training process. In the multiclass object detection experiment, the performance of the YOLOv5 model with fewer parameters did not match that of the single-class object detection experiment. Most of the mAP values were at the lowest level during the convergence process. The YOLOv5 model with more parameters, such as the YOLOv5l and YOLOv5x models, reached the highest mAP after iterations. This suggests that predicting the hair follicle category may be more difficult than predicting the position of the hair follicle in the dataset.
The characteristics of the hair follicle category may require a model with more parameters to better learn the features. Additionally, the dataset has a serious data imbalance problem, which may contribute to the small amount of data in some samples.
The results of our experiment indicate that the largest YOLOv5x model may produce underfitting when the sample size is small.
Therefore, we recommend using the YOLOv5l model for this dataset. The training process is depicted in Figure 9.
In a single-class object detection experiment, the training tasks quickly converged within the first 30 epochs. However, there were varying degrees of overfitting in the subsequent epochs. To study the overfitting problem, we compared the trends of different losses in the validation and training sets. The box loss mainly measures the error caused by the position of the bounding box, while the obj loss measures the error caused by the confidence. In this experiment, the batch size was set to 2 and the YOLOv5s model was used for singleclass object detection. The training process is depicted in Figure 10.
The box loss on the validation set did not show significant overfitting after it stopped decreasing and remained stable. However, the obj loss on the validation set dropped to its lowest point and showed a significant increasing trend thereafter. The box loss converged faster than the obj loss, with both converging to their lowest points within the first 50 epochs. During this experiment, the mAP was highest at epoch 49.
An additional item in the loss of multiclass object detection is used to measure the accuracy of the model's category prediction.
The overfitting problem was also studied for the training of multiclass object detection. In this experiment, the batch size was set  to 4 and the YOLOv5l model was used for multiclass object detection. The training process is shown in Figure 11. The box loss on the Validation Set did not show significant overfitting after it stopped decreasing. The obj loss on the validation set dropped to its lowest point and showed a more obvious upward trend compared to the corresponding loss in the single-class object detection training.
The box loss was more stable than the obj loss in the convergence process. The cls loss on the validation set also showed a more obvious upward trend after reaching its lowest point. During this experiment, the highest mAP was achieved at epoch 42.  To observe the practical effect of the model's predictions, we randomly selected pictures from the validation set and used the best model trained with the specified parameters to make predictions.
To ensure the accuracy of the experiment, the model that iterated 100 epochs was consistently used to make predictions when the actual algorithm was applied. In Table 1, the model detection effect trained by the YOLOv5s with a batch size of 2 is shown in Figure 12.
Most hair follicles are marked with a bounding box without a category label. The bounding boxes are generally reliable.
In Table 2, the effect of the YOLOv5l trained model's detection with a batch size of 4 is shown in Figure 13. reasonable for the detection of similarly dense datasets. However, the hair follicle dataset is quite different from the COCO dataset, so we also studied the IOU threshold. Figure 14 shows that the optimal intersection over union (IOU) threshold of the trained model is around 0.6. The optimal mAP is not far from the mAP at the 0.6 IOU threshold. The average optimal IOU threshold is less than 0.6. The model used to produce the results shown in Figure 13 is labeled as IV in Figure 14. In Figure 13, some hair follicles with a large number of hairs are incorrectly labeled as multiple hair follicles, which is the result of setting the IOU threshold too high. Therefore, decreasing the IOU threshold could be a solution to this issue.
We also applied other object detection algorithms to this dataset and compared their performance to YOLOv5. Table 3 shows the mAP obtained by using these other algorithms on this dataset. The mAP is the highest value achieved in the training process, and the parameters used are the optimal values determined through experiments. All experiments were multiclass object detection tasks, and the mAP was calculated in the validation set.
We found that YOLOv5 had the highest mAP among all algorithms.
CornerNet had the lowest mAP. We believe that the poor performance of this algorithm on this dataset is due to its reliance on boundary features of the bounding box in the detection process.
None of the bounding boxes in our dataset have obvious boundary features.

| CON CLUS ION
In this paper, we introduced the hair follicle dataset that we designed and collected. We discussed the importance of the YOLOv5 algorithm in analyzing hair follicle datasets and the challenges posed by  In an effort to address the issue of poor performance in multiclass object detection and improve the stability of the training process, we propose the use of GAN-based data augmentation.
This approach has the potential to not only reduce the cost of obtaining data, but also increase the sample size to the recommended standard.

CO N FLI C T O F I NTE R E S T S TATE M E NT
None of the authors have a conflict of interest to disclose.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available from the corresponding author upon reasonable request.

E TH I C A L A PPROVA L
The authors confirm that the ethical policies of the journal, as noted on the journal's author guidelines page, have been adhered to. No ethical approval was required as this is a review article with no original research data.