Vehicle license plate recognition for fog-haze environments

The technique of vehicle license plate recognition can recognize and count the vehicles automatically, and thus many applications regarding the vehicles are greatly facilitated. However, the recognitions of vehicle license plates are extremely difﬁcult especially in some fog-haze environments because the fog and haze blur the boundaries and characters of license plates signiﬁcantly, which makes the license plates hard to be detected or recognised. To this end, this paper proposes a vehicle License Plate Recognition method for Fog-Haze environments (LPRFH). In LPRFH, a dark channel prior algorithm based on the local estimation of atmospheric light value is applied to dehaze the blurred images pre-liminarily. Then, the images are further dehazed, and the license plate regions are detected through a Joint Further-dehazing and Region-extracting Model on basis of an object detection convolution neural network. Finally, the image super-resolution is accomplished with a convolution-enhanced super-resolution convolutional neural network, and hence the characters of license plates can be recognised successfully. Extensive experiments have been conducted, and the results indicate that LPRFH can recognise the license plates accurately even in some severe fog-haze environments.

complex backgrounds) seriously affect the quality of vehicle images captured by monitoring cameras. Especially under the fog-haze weather which has become progressively more serious [3], the fog and haze blur the boundaries and characters of license plates significantly, which makes the license plates hard to be detected and recognised. Most of the existing license plate detection methods are based on the feature extraction technique, and they cannot detect the license plates accurately in fog-haze environments, due to the fact that the fog-haze interference and image ambiguity make the features very difficult to be extracted. Likewise, the character recognition is also affected by the fog-haze interference significantly. This paper proposes a vehicle License Plate Recognition method for Fog-Haze environments (LPRFH), and as shown in Figure 2, this method provides a framework which includes the image-dehazing technique based on an Object Detection Convolution Neural Network (ODCNN) [4] and the super-resolution technique based on a convolution-enhanced Super-Resolution Convolutional Neural Network (SRCNN) [5]. Firstly, a dark channel prior algorithm based on the local estimation of atmospheric light value is applied to dehaze the images preliminarily. Then, the images are further dehazed, and the license plates are extracted with an ODCNN to reduce the distortion of image restoration caused by the cumulative errors of multiple parameters, and thus the license plates can be accurately extracted from these improved images. Finally, the image super-resolution technique is realised by a six-layer convolutionenhanced SRCNN to improve the accuracy of the characters recognition of license plates. Especially, note that in our work, the ODCNN is trained on both real images and synthesised images, while the SRCNN is trained on synthesised images because the blurred images of moving vehicles are difficult to obtain from the real scenes. The main innovations of this work are listed as follows: • An image-dehazing technique based on an ODCNN which takes into account the fog concentrations is proposed. The images are divided into several regions according to the fog concentrations, and a finer transmission map [6] can be generated by the values of local atmospheric light rather than those of global atmospheric light, and then the fog-haze interference can be filtered from the original images as much as possible. • A Joint Further-dehazing and Region-extracting Model (JFRM) constructed on basis of an ODCNN is provided. JFRM can dramatically reduce the distortion of image restoration caused by the cumulative errors and accurately detect the license plate regions of the preliminarily-dehazed images. • In the proposed convolution-enhanced SRCNN, the layer number of convolution is increased from three layers to six layers, which makes much more image features be extracted, and thus the image quality can be significantly improved.
The rest of this paper is organized as follows: Section 2 gives some related works. Section 3 introduces the method of vehicle license plate recognition in fog-haze environments. Section 4 reports the performance evaluation and simulation results. Finally, Section 5 concludes this paper.

Image dehazing and object detection
The problem of image dehazing has been extensively studied [6][7][8][9]. In [7], McCartney et al. proposes an atmospheric light scattering model, that is based on the Mie scattering theory, and the incident light attenuation model and the atmospheric light imaging model are used in the imaging mechanism. According to the atmospheric light scattering model and the dark-object subtraction technique, He et al. [6] proposes a dehazing method based on dark channel where at least one color channel has some pixels with low intensities. Reference [8] formulates a haze removal model which integrates with random forest to estimate the transmissions. This model combines the dark channel, local max contrast, hue disparity and local max saturatior. In [9], an efficient image haze removal algorithm is proposed, where the color attenuation prior is considered, and a linear model is constructed for the scene depth of the hazy image under this prior. Convolution Neural Networks (CNNs) have witnessed a prevailing success in computer vision tasks and can be introduced for the haze removals [10,11]. Reference [10] proposes a lightweight CNN, which is designed based on a re-formulated atmospheric scattering model, instead of estimating the transmission matrix and the atmospheric light, respectively. [11] formulates an end-to-end generative method for the image dehazing issue. However, this method does not take into account the atmosphere scattering model. CNN-based model has exhibited a great progress in terms of the object detection accuracy. In recent years, some light-weight deep learning models for object detection yield some preferable results. Most of these models are constructed based on SSD [12], SqueezeNet [13], AlexNet [14] or GoogleNet [15]. In these models, the pipelines of object detections comprise many components, such as pre-processing, large number of convolution layers and post-processing. The sliding window approach or region proposal method can be taken to evaluate the classifiers. These complex pipelines are computationally intensive and consequently slow. To improve this shortcoming, in [16], the You Only Look Once (YOLO) is proposed, and it is framed as a single regression problem. The outstanding advantage of YOLO is to encode the contextual information, and it makes less mistakes in term of confusing the background patches of images.

Image super resolution and character recognition
Single image Super-Resolution (SR) is a process that obtains High-Resolution (HR) images from a single Low-Resolution (LR) image. With regard to the vision problems, the state-ofart super-resolution results can be produced by deep convolutional neural networks (DCNNs). Dong et al. [17] establish a deep learning approach with SRCNN. This method learns an end-to-end mapping between the low-/high-resolution images. SRCNN enhances the spatial resolution of an input LR image by some hand-crafted upsampling filters, and thus this method optimises all layers instead of handling each component respectively. In [18], a highly accurate single-image super-resolution method is presented. They create a deeper network with 20 layers to learn a residual image based on SRCNN, and this method exhibits a preferable performance. [19] presents SRGAN, a Generative Adversarial Network (GAN) for the image Super-Resolution (SR), to recover the finer texture details. It is a framework capable of inferring photo-realistic natural images for 4× upscaling factors.
Character segmentation is the foundation of the character recognition. Reference [20] presents a character segmentation algorithm, which makes use of priori knowledge and realises the character segmentation based on connected domains. Besides the licence plates, character segmentation is applied in the handwriting text segmentation. Jun et al. [21] propose a new handwriting character segmentation method based on the non-linear clusterings. The whole text line is first divided into some strokes, and the similarity matrix is calculated. Then, the cluster labels of the strokes are obtained by a non-linear clustering method. Based on the cluster labels, the strokes are combined to form the characters. In [22], Roy et al. present an end-to-end realtime text localisation and recognition method to handle the digital documents in real scenes. [23] proposes a segmentation-free license plate recognition algorithm that utilises deep learning techniques in the character detection and recognition process. It improves the recognition results of low-resolution images.
Automatic License Plate detection and Recognition (ALPR) is a quite popular and active research topic in the field of computer vision. Reference [24] presents an automatic framework for License Plate (LP) detection and recognition from complex scenes, which is based on mask region convolutional neural networks used for LP detection, segmentation and recognition. An end-to-end ALPR method based on a hierarchical Convolutional Neural Network (CNN) is proposed in [25]. The core idea of the proposed method is to identify the vehicle and the license plate region using two passes on the same CNN, and then to recognise the characters using a second CNN. Wu et al. [26] proposes a novel PixTextGAN that leverages a controllable architecture that generates specific character structures for different text regions to generate synthetic license plate images with reasonable text details. Specifically, a comprehensive structureaware loss function is presented to preserve the key characteristic of each character region and thus to achieve appearance adaption for better recognition. In [27], a robust framework for license plate recognition in the wild is designed to solve the problem of significantly reduced license plate recognition performance under feature rotation, distortion, occlusion, blur, shadow or extreme dark or bright conditions. It is composed of a tailored CycleGAN model for license plate image generation and an elaborate designed image-to-sequence network for plate recognition.

Motivation of our work
Under the influence of fog-haze weather, the detection and recognition of license plates may become invalid. Considering the aforementioned fact that the fog and haze blur the boundaries and characters of license plates significantly, it is impossible to achieve an ideal recognition accuracy. An intuitive idea to eliminate the effects of fog and haze is to employ the image dehazing technology and improve the feature extraction of license plate regions. However, much extra cost will be produced in the image dehazing process, while the dehazing effect still cannot be guaranteed. CNN is widely used in computer vision tasks, and it can be introduced for the image dehazing issue. This paper attempts to improve the dehazing effect especially in the license plate region. To this end, after a preliminary dehazing process, we make the images be futher dehazed, and the license plate regions are detected through a convolution neural network, that is, a joint optimisation is implemented to improve the dehazing effect. Finally, a super-resolution network is exploited to improve the image resolution, and hence improve the character recognition accuracy.

Improved dark channel prior algorithm
In the existing image dehazing works, the atmospheric scattering model is widely used. According to the atmospheric scattering model, the physical description of the degraded image is approximated as: where t (x) = e − ⋅d (x) , and t (x) represents the permeability of the atmospheric medium, represents the atmospheric scattering coefficient, d is the depth of scene, x is the pixel position, I (x) is the fogged degraded image, J (x) is the restored image, and A is the atmospheric light value. Obviously, the solution of J (x) requires the prior knowledge of t (x) and A.
In most local areas of an image, the minimum intensity value of the color channels of pixels is extremely close to 0, and the color channel is referred to as a dark channel. With regard to an input image J , its dark channel can be expressed as: where J c (y) represents the channel of a color image, and Ω(x) represents a window centered on pixel x. According to the dark channel prior identification [6]: Besides, by (1) we obtain that: Assume that the transmittance t (x) in each window is a constant, and thus there is: (5) Combining (1), (2) and (5), we can obtain the following formula: is a parameter to adjust the dehazing degree. By (1) and (6), we can get that: The atmospheric scattering model given in (1) indicates that the image dehazing is mainly dependent on the hypothesis or prior knowledge. Thus, the value of atmospheric light value A will have an impact on the transmittance and the image dehazing performance. In the foggy image, the distributions of the fog and the brightness of each image are typically not uniform. With the estimated global atmospheric light value, the image after a dehazing recovery will produce a halo phenomenon in the area of fog concentration transition, while the local area will produce a high or an overbright phenomenon. To this end, this paper proposes a fog concentration method, which divides each image into several regions with different fog concentrations, and then processes the regions with different fog concentrations, respectively. Note that in this method, the local atmospheric light value is taken to measure the transmittance rather than the global value.

Regional division
Due to the added atmospheric light in fog-haze weather, the captured foggy image is usually brighter than the fog-free image. Hence, with regard to the different regions of the foggy image, the brightnesses with different fog concentrations are quite different. Besides, the dark primary color in the dark channel of a fog-free image is close to 0, while in the foggy case, the value of the dark channel becomes larger due to the fog interference. Therefore, here, the image brightness and the dark channel are taken as the feature variables to divide the pixels in each foggy image into three regions. Firstly, we gray-process the image, and these grayscale values represent the luminances of each pixel. Then, we calculate the dark channel value for each pixel, with the luminances value and dark channel value being vectorised. Finally, the k − mean clustering algorithm is taken to automatically divide the pixels into different fog concentration regions. With regard to different fog concentrations, the atmospheric light values and transmittance are estimated, respectively.

Local estimation
After the division of the fog concentration regions, the local atmospheric value is independently estimated for each region. Then, the dark primary color of the foggy image is estimated, and in each region, we select the pixels with the top-0.1% brightnesses of the dark primary colors. Finally, in the selected pixels, the maximum brightness is taken as the atmospheric light value of the region: where Ω(l ) represents the regions corresponding to the top-0.1% pixels. According to formula (8), the local atmospheric light values in different concentration regions are calculated, so the transmittance can be calculated by (6), and then the reconstructed image can be obtained according to (7).

Joint optimisation of further dehazing and detection
Under the influence of the fog environment factors, in order to detect the license plate more accurately, against the shortcomings of the traditional dehazing method based on parameter estimation, we introduce the idea of joint optimisation of further dehazing and detection. In the YOLOv3 detection model, the image refinement and dehazing process are embedded, and the dehazing process of the previous step can be improved, which is described as follows: We provide a JFRM with convolution neural network, which is composed of three modules: (i) multi-scale densely connected dehazing optimisation network; (ii) feature extraction backbone network; (iii) feature interaction detection network. The JFRM network structure is shown in Figure 4.

JFRM network structure
(i) In the multi-scale dehazing optimisation network, we add some filters to form the multi-scale features, while the convolution layers and connection layers are combined to compensate for the missing information in the convolution process. Moreover, we endow some residual blocks to deepen the perception of the haze structure in the images, so as to accurately capture the boundary details of objects with different depths, while the perceptual boundary details are expected to be more reliable than the pixel-level details. (ii) In the feature extraction backbone network, the Darknet-53 network in YOLOv3 is taken as the main network structure for feature extraction. This network combines the convolution layers and the residual blocks. With regard to a resized image with the size of 416×416× 3, the convolution layer uses 32 different 3×3 convolution kernels, and slides on this image to perform a convolution operation and extract the features. Especially, note that the residual blocks will yields the residual results without changing the sizes of input image and output image. The residual blocks can control the propagation of the gradient, avoid some problems of gradient dispersion and gradient explosion, and strengthen the training speed. (iii) In the feature interaction detection, we use the yolo layer to interact with the features extracted by the Darknet-53 multiple downsamplings. It outputs the last three features from five downsamplings to be mapped into three yolo layers with different scales. The feature map size of the minimum scale is 13×13, the feature map size of the mesoscale scale is 26×26, and the feature map size of the maximum scale is 52×52. Thus, the yolo layers of three different scales can apply a 1×1 convolution kernel to produce an output similar to a fully connected layer.

JFRM loss function
We give a loss function to measure the performance of JFRM. For the detection module, we improve the prediction according to the coordinates of bounding boxes, the confidences and the conditional class probabilities. In the further dehazing process, we measure the difference between the output image and the original fog-free image. When the bounding boxes with different sizes are predicted, the prediction error may be completely different. In order to solve this problem, we use the square root of the width and height instead of the original width and height. The prediction error function regarding the coordinates of bounding boxes is written as follows: With regard to the confidence error, the cross entropy is taken to calculate the loss function under two cases: an object and no object. This is because YOLOv3 divides each image by the grid, and it does not include the object to be detected. If the two cases are identically treated, it will cause the detection network to be unstable or even divergent. Thus, the confidence loss weights of the bounding boxes without the object are reduced, and the loss weights are set to t = 5 and f = 0.5, respectively. Under such settings, the bounding boxes in each grid will be more standardised. The prediction error function of bounding box confidence is given as follows: whereĈ j i is the confidence, and C j i is an IOU (Intersection Over Union) value of the obtained bounding box and the ground truth.
Then, with regard to the classification error, measuring the determination of the class which the bounding box belongs to, we also take the cross entropy as the loss function. When the j -th anchor box of the i-th grid is responsible for an object, the bounding box generated by the anchor box will calculate the classification loss function. The prediction error function of classification is given as follows: whereP j i represents a true set of class probabilities of the jth anchor box belonging to the i-th grid, and P j i represents a set of class probabilities of the bounding box generated by the anchor box.
Finally, with regard to the error of the further dehazing module, PSNR (Peak Signal to Noise Ratio) is applied to measure the difference between the obtained image and the original fog-free image, which can be achieved by minimizing the MSE (Mean Squared Error) at pixel level, and it is expressed as: whereĴ (x i ) is the original fog-free image, and J (x i ) is the obtained image, i is the channel index, and N is the total number of pixels. Therefore, the loss function combines the loss of detection module and the loss of further-dehazing module. Besides, in the detection module, the regularization L 2 is added into the loss function for the normalization. The loss function expressed as:

Improved detection algorithm
In the JFRM detection module, the mulit-scale features are introduced for generating the bounding boxes with different scales. However, all the bounding boxes share a unified coordinate system, and thus an object may have many overlapping bounding boxes. To solve this problem, a non-maximum suppression (NMS) algorithm is implemented to eliminate the overlapping bounding boxes. Firstly, the NMS algorithm sorts the bounding boxes in a descending order according to their confidences. Then, in the same class, the IOU values falling into the interval of the high-confidence bounding boxes, and the lowconfidence bounding boxes will be calculated. If the IOU values are larger than a preset threshold, the bounding boxes with the low confidences are suppressed. A cross-class suppressed method is given to suppress the lowconfidence bounding boxes, no matter whether their classes are consistent with the high-confidence bounding boxes. The IOU threshold is set to 0.55. In the cross-class suppression algorithm, the dehazed images and the bounding boxes extracted from each image are traversed. For the each image, the IOU values between their any two bounding boxes are calculated, while their confidence scores are compared. Then, the label flag is initialised by 0. Suppose b i and b j are the obtained bounding boxes, when IOU (b i , b j ) ≥ thresh and the Score(b i ) ≥ Score(b j ), the b j and its confidence score are deleted; otherwise, b i and its confidence score are deleted, while the label flag is set to 1. Then, in the other two bounding boxes, the IOU value is calculated, and the confidence scores are compared again. Finally, the optimal bounding boxes and their confidence scores are output.
For the detection module of JFRM network, the following problems probably occur during the training process: (i) network over-fitting problem; (ii) the accuracy of the test set is very low; (iii) model training is very slow. To this end, the following improvements are made in this paper: 1) In order to avoid the network over-fitting problem, the regularisation is added into the loss function for the normalisation; 2) The network configuration file of JFRM is modified, and the random parameter is set to 1, so that the image size can be modified randomly at each iteration, which can improve the accuracy of license plate detection; 3) In order to accelerate the training process, a momentum update mechanism is adopted;

Bounding box position correction
With regard to the tilted license plates, the bounding boxes are inaccurate. This is because JFRM network has not considered FIGURE 5 Position correction of a bounding box the spatial transformations, and it only generates the rectangular bounding boxes. Hence, the WPOD-NET [28] is introduced to correct the positions of bounding boxes, as shown in Figure 5. The characteristics of Space Transformer Network (STN) are introduced into WPOD-NET, which can be used for detecting the non-rectangular regions. For a tilted license plate, we first consider a rectangular box with a fixed size around the center of a pixel (m, n), If the object probability of this pixel is larger than a preset detection threshold, and some of the regressed parameters are utilised to build an affine matrix that transforms a rectangular box into a tilted license plate region.

Super-resolution reconstruction
After the accurate license plate regions having been obtained, many license plate images may produce the phenomena of blur, noise pollution and so on, due to the environment interference, which leads to a poor performance of the character recognition. Then, the convolution-enhanced SRCNN is proposed to obtain the high-resolution images. With regard to a lowresolution image, the SRCNN first resizes the image to the target size by the bicubic interpolation, and then the non-linear mapping is accomplished through the convolution layers. The structure of the convolution-enhanced SRCNN has six layers, in which the kernel size of the first convolution is increased. In the third to fifth layers, the convolution kernels with different sizes are executed continuously to further extract features, and the sixth layer reconstructs the mapped features and generates the high-resolution images by 5×5×1 convolution kernels. The parameters of the convolution-enhanced SRCNN are shown in Table 1.

Character recognition
After the high-resolution license plate images being obtained by the convolution-enhanced SRCNN, the characters of the license plate can be accurately recognised. In our work, the connected domain method and template matching method are exploited to realise the character segmentation, and then the ANN (Artificial Neural Network) is adopted to achieve the character recognition. In the character recognition step, we focus on the region which contains the alphanumerics that we want to recognise discarding the other regions. Thus, we do some post-processes. Firstly, the gray-process and Otsu ′ s binarisation are performed in addition to a Gaussian blur, to remove possible noisy artifacts. Secondly, the connected domain method is utilised to obtain a list of possible characters and the connected domain of alphanumerics. Then, these possible characters are matched with the license plate characters, and the overlapping characters inside the outline are removed. Finally, the characters are split to get a series of single character blocks, and we rely on the OCR network from other people's work to identify the license plate characters.

SIMULATIONS
In this section, the training datasets of JFRM network and the convolution-enhanced SRCNN are first described, and then we

Dataset
The real vehicle images are taken as datasets. These images includes various natural scenes (such as daytime, nighttime) with a total of 3000 images. By adding noise, rotating and other image processes, the number of the images can be increased to 12,000. Then the xml file of the features of the license plates are obtained by manual annotations. Besides, 300 images with the motion blur are taken as the dataset of the convolution-enhanced SRCNN.

A fog concentration dehazing method
In this simulation, a fog concentration method is used to calculate the local atmospheric light value and obtain the trans- Training samples of super-resolution mittance, which can produce better restored images. In order to verify the performance of the fog concentration method, a comparison is made among the Soft-matting method [6], the Guided filtering method [29] and the fog concentration method, as shown in Figure 9. From Figure 9, the Soft-matting method exhibits a partial distortion phenomenon, and the colors of the restored images which obtained by Guided filtering method are darker than those of the fog concentration method. Besides, the visible gradient method [30] is also utilised to evaluate the image dehazing results. Table 2 provides the evaluation indexes of the restored images, where e is the ratio of the visible edges, r is the ratio of the visible edges gradient, and is the percentage of the saturated black or white pixels. The values of e and r become larger, or the value of the becomes smaller, indicating that the restored images are clearer. As shown in Table 2, the quality of the restored images by the fog concentration method are higher than others.

Joint optimisation of further-dehazing and detection
As aforementioned, we embed the further-dehazing into the detection network to form a joint optimisation. The JFRM network is trained end-to-end by minimising the loss function which is expressed as the sum of (9), (10), (11) and (12). With regard to the further dehazing module of JFRM, the weights are initialised by using some Gaussian random variables, and the ReLU neuron is utilised instead of the BReLU neuron. To help JFRM network achieve the detection of license plates, the voc.data, the number of the classes and the number of the filters are modified. The data of the ImageNet 1000 classes are used to train the first 20 layers of the detection module, which also obtains the initial weights of the YOLOv3. JFRM network takes around 10 training epochs to converge, and usually performs sufficiently well after 10 epochs. The loss, IOU, Class and Recall curves can reflect the preferable accuracy of the JFRM network. Figure 10 indicates that the curves converge to some preferable values as the number of the batches increases.
In order to avoid the over-fitting problem during the training process, the regularisation parameters L 1 , L 2 are added into the loss function. Besides, the Dropout layers are added after the executions of full connection layers. The regularisation parameter L 1 can produce a sparse matrix, which implies that many optimal weights should be set to 0 or close to 0. The regularisation parameter L 2 can produce some small and scattered weights, which indicates that the network can take advantage of more features during the training process, and this mechanism can enhance the generalisation ability of the network.
It can be seen from Table 4 that the accuracy of the test set is 87.11% after adding L 1 into the loss function. In contrast, with regard to another parameter L 2 and the Dropout layers, the accuracy of the test set can reach 91.27%, which indicates that the regularisation method of L 2 + Dropout layers is much better. In Figure 11, the accuracy of the object detection    becomes higher after our, and this is because the quality of the images are enhanced after dehazing, which makes the object be detected easier.

Super-resolution and character recognition
In order to improve the accuracy of the character recognition, the convolution-enhanced SRCNN is utilized to obtain the high-resolution images. Compared with the traditional SRCNN which has three convolution layers, the number of the convolution layers are sequentially increased to four, five and six. The accuracies of the test set with different number of the convolution layers are given in Table 5. Table 5 indicates that the network structures with more convolution layers can improve the character recognition accuracy. Especially, when the number of convolution layers is four or five, the accuracies of the character recognition are almost the same. The structure with six convolutional layers and seven con-   volutional layers achieves almost as high character recognition accuracy, but the complexity of seven convolutional layers is higher. Therefore, the convolution-enhanced SRCNN with six convolution layers is adopted in this paper. From Table 6, it indicates that the convolution-enhanced SRCNN produces the preferable results, and Figure 12 indicates that all license plates can be correctly recognised in the fog-haze environments. Besides, with regard to a tilted license plate, the non-rectangular bounding box can be output. Meanwhile, the accuracy of the license plates recognition can be guaranteed under the license plates with stains as well.

CONCLUSIONS
This study explores the vehicle license plate recognition problem in the fog-haze environments. To reduce the fog-haze interference, a fog concentration dehazing method is first investigated, and in the proposed JFRM network, the further-dehazing detection is optimised jointly, which dramatically reduces the distortion of image reconstruction caused by the cumulative

FIGURE 12
The output of LPRFH errors and accurately extracts the license plates from the preliminarily dehazed images. Finally, the image super-resolution is accomplished with a convolution-enhanced SRCNN, and hence the characters of license plates can be recognised successfully. Simulation results demonstrate our method can improve the accuracy of the vehicle license plate recognition in the foghaze environments. Future research focuses on the optimisation of transmission maps in the dehazed images. Besides, we will also study an adaptive setting of the IOU threshold in the non-maximum suppression algorithm to obtain the optimal bounding box.