An algorithm selection methodology for automated focusing in optical microscopy

Autofocus systems are essential in optical microscopy. These systems typically sweep the sample through the focal range and apply an algorithm to determine the contrast value of each image, where the highest value indicates the optimal focus position. As the optimal algorithm may vary according to the images' content, we evaluate the 15 most used algorithms in the field using 150 stacks of images from four different kinds of tissue. We use four measuring criteria and two types of analysis and propose a general methodology to apply to select the best fitting algorithm for any given application. In this paper, we present the results of this evaluation and a detailed discussion of different features: the threshold used for the algorithms, the criteria parameters, the analysis used, the bit depth of the images, their magnification, and the type of tissue, reaching the conclusion that some of these parameters are more relevant to the study than others, and the implementation of the proposed methodology can lead to a fast and reliable autofocus system capable of performing an analysis and selection of algorithms with no supervision required.


| INTRODUCTION
Automation in the medicine field leads to faster, cheaper, and more accurate results, specifically by using digital imaging in the digital pathology area. Automation and digitalization of diagnostic procedures reduce the acquisition and processing times and improve accuracy, throughput, and reproducibility of the measurements (Saerens et al., 2019;Xu et al., 2017). In addition, it allows sending the acquired digital images to experts all over the world for consults and automatically storing patient records (Liao, 2018). As medical diagnosis is a cognitive process, its automation will as well reduce the physician's workload, setting a better environment to reduce clinical errors and leading to a better health care (Panicker et al., 2016).
Fast and reliable autofocus systems are crucial for microscopy automation, allowing real-time high-resolution image processing of all the possible fields of view in a sample (Bueno-Ibarra et al., 2005;Hosseini et al., 2020). All the subsequent analysis applied on the specimens depend on the quality of this focusing mechanism, as slight deviations from the optimal focus position would generate unreliable results (Hilsenstein, 2005).
There are two main categories for autofocus methods: active and passive (Castillo-Secilla et al., 2017;Israni et al., 2016;Kehtarnavaz & Oh, 2003). Active autofocus implies the emission of an ultrasound or electromagnetic wave, such as an infrared light beam, meant to fall upon the surface of the object to be focused. A sensor captures the reflection of this wave, and the distance from the object to the lens is calculated based on triangulation or the time needed for the signal to The proposed methodology objectively quantifies the quality of an algorithm and evaluates its performance, comparing the results between different algorithms and selecting the optimal one for any application.
To develop an automated focusing methodology, we have taken a total of 150 stacks of images from four different types of tissue, applying two magnifications and using two bit-depth quality configurations.
Then, we have selected the 15 most used algorithms from the bibliography and developed an automated script in Python to evaluate their performance over all the stacks. So, in the following sections, we will show the steps followed and propose a methodology suitable for any kind of images.

| Data sets and image acquisition
The Faculty of Health Sciences of the Rey Juan Carlos University provided us with four different mouse tissue samples: kidney, stomach, intestine, and adipose tissue, with a 5 μm width and hematoxylineosin staining and the means for a manual acquisition of the images with a Zeiss Axioskop 2 microscope equipped with the image analysis software package AxioVision 4.6 in bright-field modality.
We took images with a resolution of 1388 Â 1040 pixels from four different tissue samples. For each tissue sample, we used two magnifications, 5Â and 10Â, and each magnification was applied with two bit depths, 8 bits and 16 bits, resulting in a total of 16 categories.
Each category contains 10 stacks, except for adipose tissue 10Â 8 bits and 16 bits, which contain 5 stacks each.
This results in a total of 150 stacks, which were then analyzed by an expert technician in order to determine the optimal focus position for each stack. A sample of these images is shown in Figure 2.
These stacks are the best approximation to a sweep of all the possible fields of view in a sample, taking into account the need to use a finite number of images per stack, fluctuating this number between 10 and 20 images, and the limitations of a manual setting, which made it difficult to establish a fixed number of images per stack, as we were unable to determine a fixed position to begin or end the sweep, or a fixed interval between images.

| Contrast calculation algorithms
As it has been mentioned, many algorithms for contrast calculation are available. To select the most commonly used in microscopy, we chose five different reviews of autofocus functions from the literature. Those algorithms present in more than one of the references are shown in Table 1. As a result, we selected for this study the 15 most used algorithms.
The 15 algorithms are applied to all the images in a stack to generate its contrast functions, considering that some of them have a threshold value, which is indicated along their formulation. The 15 selected algorithms are presented below with their formulation, where M is the number of lines of the image, N the number of pixels per line, g(i, j) is the gray level of the pixel in the (i, j) position, ḡ is the mean value of all the pixels in the image, and θ is the threshold value.

| Based on correlation measurement
where G x i, j ð Þ, G y i,j ð Þis the convolution with Sobel operator.
Brenner gradient First-order Gaussian derivative where G x x, y, σ ð Þ, G y x, y, σ ð Þ is the first Gaussian derivatives and σ ¼ Threshold absolute gradient Absolute Tenengrad where G x i, j ð Þ G y i, j ð Þ is the convolution with Sobel operator.

| Based on depth of peaks and valleys
Image power Thresholded pixel count

| Evaluation criteria
To quantify the performance of each algorithm in a given application, we need some criteria to objectively evaluate the quality of the resulting contrast functions. Consequently, we chose the four criteria usually used in similar studies.
These criteria have an optimal value equal to zero because they are defined by mimicking the features of an ideal function, which has an unequivocal maximum (Osibote et al., 2010;Qiu et al., 2013). for each stack, we obtain a criteria table with as many rows as resulting functions and as many columns as criteria.

| Proposed analysis
Once we have calculated the four criteria and obtained the criteria tables, we need an objective method to grade each of the algorithms.
For this purpose, we propose two different analyses to compare the algorithms: a semiquantitative and a quantitative analysis.

| Semiquantitative
This first analysis is a simpler one, where for a given stack and criterion, the result of an algorithm is compared with the rest and ranked accordingly. The ranking method would be as follows: the algorithm with the best result is given a 1, the second a 2, and so on. If two or more algorithms were to share a score, they all will be ranked the same.
The total score for each criterion is the addition of the scores obtained in all the stacks analyzed, as shown in Figure 4, and the global results for this analysis will be the addition of the total scores of all the criteria.

| Quantitative
As a more extensive procedure, the quantitative analysis compares the results of the algorithms in each criterion not between them but to a theoretical ideal function defined as having a value of zero in all the criteria. Therefore, we normalize the results of all algorithms within a criterion, as well as the value of the ideal function, and compute the distance of each algorithm to the ideal value.
Then, for each criterion along all the stacks, the Euclidean distance is calculated for the 15 algorithms, obtaining the total score for each criterion, as shown in Figure 5. The global results of this analysis are obtained by adding the total scores of each criteria.

| Methodology
The general methodology we propose summarizes the previous sections, as each of them is one of the steps to follow. This is shown in Figure 6, where the sections of the paper on where each step is explained are included in the labels.
From the tissue samples, we perform the image acquisition, obtaining the set of stacks to analyze. To these stacks, we apply the selected algorithms, resulting in a series of contrast functions, one per each algorithm and stack. These functions are quantified by the evaluation criteria, giving as an output the criteria tables, which are the data used to feed the analyses, finally producing a ranking of the algorithms.
With a view to implement the proposed methodology with an automatic tool, we followed three steps: development, preparation, and execution.

| Development phase
To develop an automatic tool to perform all the steps of the methodology, we have created a Python script that performs the calculation of the contrast functions, followed by the scoring of each of these functions by the evaluation criteria, and then grades this data by performing both analyses.
We selected Python not only because of its popularity and ease of use, as it is a high-level programming language with an enormous background of available libraries, but also because is a language optimized for matrix calculations.
Only a little information is needed for the developed code to work, and it is fed in an array shape with as many elements as the number of stacks we have or number of algorithms to evaluate: 1. The optimal focusing position of each stack, that we need to determine manually.
2. The number of images of each stack.
3. The value of the desired threshold for each algorithm, in case they had one.
F I G U R E 4 Total score for semiquantitative analysis F I G U R E 5 Total score for quantitative analysis Also, we have to determine the path to the folder in which we would like to save the results, as for the one containing the images to analyze.

| Preparation phase
When evaluating new images or algorithms, the following steps need to be followed to prepare the data.
1. Convert the acquired images from RGB into gray scale, significantly reducing the amount of data to process, as the number of pixel matrixes decreases from three to one.
2. Manually examine the images stack per stack to determine the number of images per stack, and the correct focus position, which is required to assess the accuracy.
3. Translate the mathematical formulation of the selected algorithms to Python and include them in the script.

| Execution phase
Once we provide the script with the data, it will run the algorithms and perform the analyses, giving as a global output a spreadsheet document for each tissue and magnification studied.
This document contains a sheet for each stack, with the normalized contrast functions of each algorithm, the criteria table, the scoring of the semiquantitative analysis, and the calculations of the Euclidean distances needed for the quantitative analysis.
There is also another sheet per each bit depth, in which the data of each stack needed for both semiquantitative and quantitative analyses will be featured, together with the total and global scores, and the ranking.

| RESULTS
As mentioned in Section 2, some algorithms have a threshold. We visually assessed the results of different threshold values, finding that a threshold of 20% of the maximum gray level in each bit-depth configuration had a better performance.
With this threshold, we applied the algorithms, and for each of the stacks, we obtained 15 contrast functions, having some of them better outputs than the rest. As an example, Figure 7 shows the best F I G U R E 6 General methodology For each of these curves, we calculate the value for the four criteria; therefore, 150 criteria tables as the one shown in Table 2 are obtained from the evaluated data. These tables are then processed by the two proposed analyses.
For the four different categories in this study-tissue, magnification, bit depth, and analysis-the results of this processing, that is, the global scores and ranking of applying the 15 algorithms to the 150 stacks, are summarized in Tables 3, 4, 5, and 6.
In addition to those tables, Figure 8 shows a summary of the ranking results. In the figure, the results of the 15 algorithms for the four different tissues are highlighted in different colors, where the higher the bar, the better the ranking position; therefore, the best algorithm would be squared gradient. This figure also shows that algorithms perform similarly along different tissues, especially for the topscoring algorithms, as the different color sections have similar weight.
As an example, squared gradient has equal slots for all tissues,

| DISCUSSION
Our goal was to propose a general methodology to automate the algorithm selection method for the focusing systems used in optical microscopy. With this purpose, we evaluated several features and observed that not all of them are equally relevant, as we comment on the following paragraphs.

| Threshold
As some of the algorithms have a threshold value, for this study, we have only tested three possible thresholds: 10%, 20%, and 50% of the maximum gray level possible in each bit depth configuration, and the presented results use the 20% value for all the algorithms, as we observed it to be the best featuring one.
If we wanted to add an extra level of accuracy to the analysis, the algorithms with threshold values could be examined to a greater extend by taking an analysis of its own in the first place, where we would test the spectrum of possible thresholds and determine the best performing threshold value. Hence, only the best version of these algorithms would be tested with the algorithms without threshold, resulting in more balanced outcome.

| Criteria
If we closely examine the 150-criteria tables obtained after the application of the four criteria, as the one showed in Table 2, to all the contrast functions calculated, as the featured in Figure 7, and use them as an example, we will see that the scoring of the algorithms correlates with the qualitative features of the curves: the curves that meet the original criteria of leading to a fast and clear identification of the correct maximum of the function tend to have the lowest scores, while the ones which clearly are not a good fit tend to have higher scores.
However, this tendency is not always met for all the criteria at once, meaning no single criterion can be excluded from the study as none of them are giving redundant information. And, in the cases  where the first three criteria were to be equal, the fourth one would be the tiebreaker, as for (9), (11), and (12). Another consideration is that we deliberately ruled out one criterion also commonly used in other studies: the execution time. We did not evaluate this criterion because our goal was to develop a general methodology to select the best fitting algorithm, prioritizing performance over execution time.
Yet, in further studies, it might be an interesting criterion to consider.

| Analysis
When comparing the results of both analyses with a qualitative evaluation of all the curves obtained, we can appreciate that both analyses are effective and have similar results.
The slight deviations between both analyses can be explained by the fact that for the semiquantitative analysis, it is allowed for two algorithms to share the same score, although this would make no difference in most cases.
Nevertheless, in most of the studied cases, the kind of analysis used was found to not be relevant, as in most scenarios, the results were the same when implementing one or another.

| Bit depth
Although some variations can be appreciated in the overall ranking for all the categories when switching between 8-or 16-bit depth, those deviations differ only in one ranking place, going up or down one position.
In the results presented, there is only one case not adjusting to this rule, even though it is not relevant enough to indicate a better performance when using 16-bit depth images, turning a fourth position into a seventh.
Consequently, none of the variations observed are found significant enough to justify the use of 16-bit images, that is, double the size images, when the results obtained with 8-bit depth images are almost identical.

| Magnification
As expected, there are some differences between the 5Â and 10Â magnifications since this setting directly affects the content of the images and the size of the cells.
However, those differences are not as wide as predicted, switching only a few places, in most cases, the algorithm's ranking and maintaining almost the same algorithms as the most effective ones at the top positions.
But as opposed to the bit depth, this factor is found to be significant enough on account of the differences across the different tissues: In stomach tissue, the ranking is practically identical for both magnifications, but radical changes appear in other tissues, as for (14) in adipose tissue, or (7) in intestine tissue.
Therefore, as seen in the results, there is no assurance in the effectiveness of an algorithm given a specific magnification, even if the performance in a different magnification is known.

| Tissue
This category has also reported less variations than expected since the cell size, shape, and distribution depend on the tissue, thus directly affecting the content and density of information in the image.
Being that the case, there are three factors to take into account that might have had a part in the results obtained: first, for each of the studied tissues, the different stacks were taken from both the perimeter and the inner part of the sample, balancing the density of information among the stacks.
Secondly, all the images studied were taken in a bright-field microscopy setting, where there are some algorithms with better performance, as there are for fluorescence microscopy (Osibote et al., 2010;Shah et al., 2017). Therefore, it is likely that the same algorithms stand out in all tissues, being always in the top of the ranking.
Finally, a routine dye for microscopy was used, meaning no special areas or cells were highlighted, as in a pathology exam of a tissue would be done, making all the cells present in the sample visible and increasing the level of content in the images.
However, the variations in the ranking between the different tissues cannot be dismissed lightly, as there are many tissues to be studied, and for the purpose of this work, only four have been considered.
In any case, these results can be used to distinguish the best algorithms to consider against those that could be directly discarded for bright-field microscopy.

| CONCLUSION
Fast and reliable autofocus systems are crucial for microscopy automation. Our goal was to propose a general algorithm selection methodology for automated focusing. In this paper, we have reviewed the methods used in similar studies, developed, and implemented in F I G U R E 8 Ranking of algorithms summary Python a general methodology applicable to any field. The proposed methodology allows the user to objectively compare and grade the performance of a set of algorithms through different stacks.
This work has been validated with 150 stacks of images from four different types of tissue, applying two magnifications, and using two bit-depth quality configurations. Thus, it has been shown there are factors more relevant than others, being the type of tissue and the magnification more important than the bit depth or the analysis used, which have been found to give almost the same results whether selecting one feature or the other.
The best performing algorithms can be identified in Figure 8, and a more detailed scoring can be found in Tables 3, 4, 5, and 6. This scoring will show no significant differences in the results between bit depths but slight deviations for different magnifications or tissues. In this way, whenever the type of tissue or the magnification change, a new analysis must be made to select the best algorithm.
However, this process can be now made with a reduced list, having to test only the half of the original 15 algorithms, which are the ones proved to be more effective in bright-field microscopy-that is, all 15 algorithms must be tested again.
Additionally, the top performing algorithms have shown to be quite accurate at defining the focus position, that is, the point with a higher contrast value. Therefore, this feature can be used to create a self-contained analysis method, which could independently determine the optimal focus point of a given stack by selecting the most repeated maximum point. And this result could be used as the reference to assess the first criterion: the accuracy.
Thus, if the positioning system of the samples were to be automated, and the number of images taken per stack was always the same, only the threshold value would be needed as an input. This parameter could be fixed, as we used a 20% value for all algorithms, or even included as a variable to study as previously mentioned in the discussion.
Consequently, with this methodology and our implementation, we could provide the system with capability to analyze, select the algorithms, and focus with no supervision required.