Image defogging based on amended dark channel prior and 4-directional L 1 regularisation

The dark channel prior (DCP) algorithm has been widely used in the ﬁeld of image defogging because of its simple theory and clear restoration result. However, the DCP algorithm has signiﬁcant limitations. This study clariﬁes the relationship between halo artfacts and the size of the dark channel patch of the DCP algorithm and analyses the reason why the colour of close-range white objects appears distorted in the restored images. An amended DCP method is then proposed to solve these problems, utilising a locally variable weighted 4-directional L 1 regularisation and a corresponding parallel algorithm to optimise the transmission. A deep neural network, 4DL 1 R-net, is then trained to further enhance the processing speed. Extensive experiments demonstrate that this method is effective. The proposed method can obtain clear details, maintain the natural clarity of images, and achieve signiﬁcant improvements over state-of-the-art methods.


INTRODUCTION
In recent years, the quality of outdoor digital images has been seriously affected because the occurrence of foggy or hazy weather has increased. In many fields, including highway traffic monitoring, intelligent driving, and surveillance systems of train stations, this problem is more obvious and affects the safety of human lives and property. Moreover, foggy images affect the performance of many high-level computer vision tasks, such as object detection and recognition. Defogging is, thus, becoming an increasingly desirable technique for both computational photography and computer vision tasks, and it has been extensively studied. Many studies  have made significant progress in the single-image defogging field. The early approaches focused on image enhancement [6][7][8][9], whereas the later approaches developed handcrafted features based on the statistics of clear images, such as Colour-Lines [15], DCP [16], and boundary constraint [29], methods.
In the field of computer vision, the model most widely used to describe the formation of a hazy image is as follows: where I(x) is the foggy image, and J(x) is the theoretically clear image. t (x) = e − d (x) represents the transmission under foggy conditions and attenuates exponentially with the depth of field of the scene. A is the value of atmospheric light. Equation (1) indicates that image defogging is an ill-posed inverse problem without the knowledge of J(x), A, and t(x). Thus, various image priors have been explored to evaluate transmission for single-image defogging tasks [15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32]. Each of these approaches has advantages and disadvantages. Among them, the dark channel prior (DCP) method is simple and effective and can obtain high-quality restored images; thus, it is widely used in practical applications. The theory is as follows: are, respectively, the minimum values of the dark channels of I(x) and J(x) in the local patch, Ω(x). In most local image regions that do not cover the sky, some pixels very often have considerably low intensity in at least one colour (RGB) channel [16]. In other words, Although the DCP algorithm is undoubtedly successful, we found, through several experiments, that it has several shortcomings. First, halo artfacts occur because of the different sizes of patch Ω(x). Completely eliminating this phenomenon using soft matting, guided filters, or other transmission optimising methods is difficult. Second, this method will fail if there are white close-range objects in the foggy image. Third, the recovered images are usually much darker than the original images. We analyse the causes of these problems and the relevant improvement methods in Section 3.
Recently, some scholars proposed the use of convolutional neural networks (CNNs) [31][32][33][34][35][36][37] to automatically learn feature representations for foggy images. These algorithms [31][32][33][34][35][36][37] have shown exciting results because of CNN's strong learning ability. However, these scholars focused on improving the structure of CNNs and neglected the fact that a significant number of natural foggy images and corresponding clear samples is almost impossible to obtain. During training, a large number of artificial foggy images were used. These images are synthesised by using clear images of sunny weather; their atmospheric light value is far from the actual atmospheric light values found in foggy weather. Some scholars used indoor data sets [31,32]; however, the illumination of indoor images is uneven because of local light sources and is completely different from the atmospheric light in outdoor foggy weather. Moreover, owing to the limitations of current sensor technology, there is no reasonable way to simulate fog concentration distribution through the image depth of field in all outdoor datasets. These problems lead to the unsatisfactory generalisation performance of the current CNNs on natural foggy images, especially dense foggy images.
Based on this analysis, we integrate the advantages of conventional methods and of CNNs to solve such problems. We propose an amended dark channel prior (ADCP) to improve the DCP algorithm and a locally-variable weighted 4-directional L 1 regularisation to optimise the transmission and train a lightweight CNN, named 4DL 1 R-net (4-directional L 1 regularisation), to further enhance the processing speed. Extensive experiments show that our method is effective and can achieve significant improvements over state-of-the-art methods. Our contributions are summarised as follows: 1. We propose an ADCP algorithm to amend the dark channel value pixel-wise. This method is efficient and novel. 2. We utilise the 4-directional L 1 regularisation to optimise the transmission map and propose an approximation algorithm and a parallel method to solve this complex variational problem. To our knowledge, this is the first time that the 4directional L 1 regularisation has been applied in image edge preserving filtering. This is the first study in which the parallel method is effectively employed to solve the 4-directional variational problem. 3. We train a deep learning network (4DL 1 R-net) to further enhance the computation speed; thus, our processing speed is the highest among those of current defogging algorithms (without using GPUs).

RELATED WORK
In this section, we briefly review some important defogging algorithms. At present, the mainstream image defogging algorithms can be roughly divided into three categories. The first type of method is based on image enhancement [6-9, 18, 38-41]. These methods can be represented by the Retinex method [6][7][8][38][39][40][41]. This method has a fast processing speed and a better enhancement effect overall on dark original images, but its restored image has colour distortion and image desaturation. Tan et al. [9] proposed a local contrast-maximising method based on a Markov random field under the prior condition that the local contrast of the clear image is much higher than that of the hazy image. This method is essentially the contrast stretching enhancement algorithm, and its result is prone to over-saturation of the restored image.
The second type of method is based on the Atmospheric Scattering Model [4,5,[13][14][15][16], which has been adopted by many scholars. The ideas of Narasimhan and Nayar [10][11][12] are ground-breaking here and primarily based on this model. Their restored images are bright and have higher contrast, but it is necessary to obtain images of the same scene under different weather conditions for comparison. It is very difficult to acquire such images under different weather conditions. Tarel et al. [13] used a median filter to estimate the dissipation function; however, the median filter cannot preserve the objects' edges well, resulting in a small amount of fog remaining in the depth mutation of the restored image. Fattal [14,15] proposed the theory of defogging by colour-Lines, which requires images with high colour contrast; hence, it is only suitable for mist images. He et al. [16] obtained the dark channel rule using the statistics of a large number of outdoor images, and proposed the dark channel prior (DCP) method. Meng et al. [29] proposed an effective regularisation defogging method to restore defogged images by exploring the inherent boundary constraint, but this approach is time-consuming because it requires multiple iterations.
The third type of method is based on data-driven approaches [31][32][33][34][35][36][37][38][39][40][41][42][43]. Some scholars have proposed algorithms based on machine learning [42,43] or deep learning (CNNs) [31][32][33][34][35][36][37], and these algorithms have been highly successful. In [42], Qingsong Zhu et al. adopted the colour attenuation prior and used a supervised learning method to learn the depth information. They then estimated the transmission map using depth information and finally obtained the restored image. In [43], Ketan Tang et al. investigated a variety of haze-relevant features in a regression framework based on the random forest algorithm and used a large number of synthetic hazy image patches to train this regression framework. Recently, deep learning methods [31][32][33][34][35][36][37] have been adopted in image defogging. Inspired by haze-relevant features, Cai et al. [31] proposed a trainable end-to-end CNN named DehazeNet for estimating the transmission map, with specially designed feature extraction layers. Ren et al. [32] proposed a multi-scale convolutional neural network (MSCNN) for the transmission estimation of foggy images. This work includes two parts: a coarse-scale network to learn a preliminary transmission map and a fine-scale network to refine this map. Li et al. [35] proposed AOD-Net to avoid separately estimating the transmission and atmospheric light values. The AOD-Net can directly generate a restored image through a lightweight CNN. More recently, Yang D et al. [33] proposed a deep learning network based on a dark channel prior and a multi-iteration algorithm. It can be applied to many situations such as underwater image enhancement and night-time image defogging. However, a guided filter is still used in post-processing, which leads to low processing efficiency. Ren W et al. [36] proposed a gated fusion network, which uses three inputs from an original hazy image by applying white balance (WB), contrast enhancement (CE), and gamma correction (GC). They adopted a multi-scale approach to avoid halo artfacts. This training process is time-consuming and easy to overfit. Liu et al. [37] proposed GridDehazeNet, which is a type of mesh structure consisting of pre-processing, backbone, and post-processing. This structure has excessive amounts of layers and blocks, and has difficulty achieving real-time processing results. These methods [32,[34][35][36][37] use CNNs to learn a mapping from an input hazy image to a transmission or haze-free image, but they do not consider haze-related priors to constrain the mapping space compared with traditional methods.

ADCP AND 4-DIRECTIONAL L 1 REGULARISATION METHOD
In this section, we analyse the problems existing in DCP and propose the amended dark channel prior. We propose a locallyvariable weighted 4-directional L 1 regularisation to optimise transmission and utilise the parallel method to solve this complex variational problem.

Halo artifacts
The causes of halo artifacts for various defogging methods differ, and thus there are different ways to eliminate them. Regarding the DCP algorithm, we know that the halo artifacts are related to the size of the patch in the dark channel map. After many experiments, we found that using a guided filter cannot eliminate these halo artifacts in many foggy scenes, especially for those scenes where the colour difference between closerange objects and the background is obvious and the size of path Ω(x) is larger than 5 × 5. The samples are shown below in Figures 1 and 2.
The halo artifacts of the DCP algorithm can be divided into two types: (1) These images are mainly close-range objects. If the size of patch Ω(x) in the dark channel map is larger, the background portion, which should have a high brightness value, will take the minimum brightness value of the Ω(x) region near the edges of the objects. The patch is the Ω(x) region in Equation (2). Therefore, the background is regarded as the object, and the final result is that there is an expansion phenomenon around the close-range objects in the dark channel map. The degree of expansion depends on the size of patch Ω(x). Because the background or sky portions are regarded as objects, the high value of this portion is replaced by the low value of the objects in the dark channel map. Therefore, the pixel value of this portion exceeds the normal range for standard images (RGB) when image restoration calculation is executed. Therefore, the colour around the objects is destroyed and a white colour (no colour) appears, namely halo artifacts. Specific examples are shown in Figure 1.
(2) There is a simultaneous close-range and distant view in many images. When the colour difference between the closerange objects and the distant view is obvious and the size of the patch Ω(x) is larger, the recovery images can be expected to produce halo artifacts. To be exact, the image regions that should be the residual fog around close-range objects appears as halo artifacts. If the size of the patch is larger than 1 × 1 when calculating the dark channel map, there will be an object boundary in the patches near the edges of the object, namely, there will be two different objects in the same patch. As shown in Figure 2, if the close-range view is a dark object, its grey value will be very low in the dark channel map, and then the edge of the distant object of the same patch, Ω(x), will also take this grey value (far lower than its own dark channel value).Then, the distant object is mistaken for a close-range object in the dark channel map, as shown in Figure 2 value of the distant object or background near the boundary is equal to the d(x) value of the close-range darker object in the same patch, Ω(x). This d(x) value of the distant object is far less than the real value, or even close to 0. Transmission t(x) of a distant object or background near the boundary will be close to 1. In this case, according to Equation (1), there will be J(x)≈I(x), which means that the portion near edges is not processed or the processing is considerably insufficient. Thus, there is residual fog around the boundary of the close-range objects, appearing like halo artifacts. In Figure 2, it can be easily seen that there is a fringe of halo artifacts around the edges of the close-range leaves in columns (a) and (b), which is actually unprocessed fog. This phenomenon disappears completely as the size of Ω(x) decreases to 1 × 1.  Figure 3 shows a comparison of local magnification of two different halo artifacts, and the two patch sizes are both 20 × 20. We can observe that the outer edge of the Tiananmen has no colour in the first picture. The value of this part exceeds the normal range of RGB colours and is ultimately shown in white. The second picture shows that the edge of the leaves has a pale halo of fog with little processing.

3.1.2
Invalidation of the white close-range objects When the scene objects are inherently similar to the atmospheric light and no shadow is cast on them, the dark channel prior is invalid [16]. When a white object is shown in a longrange perspective, such as the sky region in a foggy image, the dark channel theory will be invalid. At this time, its transmission (t (x) = e − d (x) ) of the sky still tends to zero due to its longdistance (d(x) → ∞), so the first term on the right of Equation (2) is still close to zero and the approximate value of t(x) can be obtained by Equation (3). This is why the DCP method in the sky area can still be used even if it becomes invalid. In Figure 4(a), the sky region of the restored image is very clear, but the dark channel value does not tend to zero in the sky region. If a white object is shown in close-range in the foggy image, the DCP will be seriously invalid and the colour of the restored image will be greatly distorted. Because the first term on the right of Equation (2) does not tend to zero, Equation (3) does not hold in this circumstance. In Figure 4(b), we can see that the colour of marble varies seriously due to this effect. This is also the reason why images restored by DCP would be darker than those processed by other methods.

ADCP algorithm
Based on the above analysis, we know that the dark channel values of the close-range white objects are not close to zero, but the relatively large values and halo artifacts are related to the patch size of the dark channel map. Therefore, we propose the amended dark channel prior to solve these problems in a pixelwise fashion. The specific steps are as follows. (2)  with J ∇ , namely: The size of the path Ω(x) is 1 × 1 in Equation (4), so we first eliminated the halo artifacts in our restored images.

For convenience in Equation
1. Then we establish the energy functional equation to amend the values of J ∇ (dark channel) as follows.
arg min In Equation (5), when ⋅ (I ∇ ) → ∞, J ∇ will be the theoretical dark channel. When ⋅ (I ∇ ) → 0, Equation (5) changes back to Equation (4). The " (I ∇ )" can be adaptively adjusted to align the value of J ∇ with the actual situation. If the value of I ∇ is very small, this indicates that this pixel accords with the dark channel prior, then the (I ∇ ) value will be very large, and the dark channel prior becomes valid and can be expected to be effective. If I ∇ is large, it suggests that the pixel may be a closerange white object or other high-brightness object, because I ∇ is the dark channel value of the original foggy image and the RGB three-channel values of white objects are all higher, and the dark channel J ∇ will not equal zero. In other words, the dark channel prior is invalid for these pixels. Then, the value of (I ∇ ) will be automatically adjusted to a small value at this time, and J ∇ will be amended greatly. Therefore, the dark channel value J ∇ can be amended pixel-wise through Equation (5), and the final result is more in line with the actual situation. In our experiments, we design several functions (I ∇ ) to adjust the weights in Equation (5). We found (I ∇ ) = −0.7 − 0.1I ∇ is easy and suitable, and the = 5.
1. Take the derivative of Equation (5) with respect to J ∇ , then take it to zero. The following equation can be obtained.
2. The transmission achieved by the DCP can be used as the initial transmission t(x). To decrease the computational complexity, we can select a random matrix of size h × w (size of the foggy image) as the initial transmission. By substituting the initial transmission t(x) into Equation (6), we can obtain the new J ∇ . Then, we put this J ∇ into Equation (4) to update the new amended t(x).
This method is efficient and does not require iteration.  Figure 5(d) shows noise because the initial value of t(x) is a random matrix. After the subsequent optimisation of the transmission map, this noise does not affect the final restored images. In our experiments, we used many different initial t(x) maps, but there was almost no difference among the final restored images. This phenomenon demonstrates that our method is valid.

Optimisation of transmission
For image defogging, the edge information is critical because it can distinguish objects with different depths of field. We argue that various parts of the same object have the same depth of field in images. Therefore, the relatively weak internal texture and other details of objects are noise and interference for the purpose of calculation of the transmission map. In the transmission map, we hope for the main edges of objects to be clear, the useless details to be restrained, and the transmission values to be kept as consistent as possible in each object. In [16,17], soft matting and guided filtering are used mainly to smooth and remove weaker details while preserving the main edges of an image, but their ability to remove residual fog is insufficient. It is difficult to balance the smoothing texture and preserve the main edges for guided filtering or soft mating. At present, there are many algorithms that have similar functions, such as Tikhonov regularisation, TV regularisation [44], gradient L 0 optimisation [45], WLS filtering [46], RTV filtering [47], and bilateral filtering [48]. Most of these algorithms are used in image denoising or other fields. Our experiments showed that a great capability of texture removal is necessary for transmission optimisation. Thus, the filter or optimising algorithm requires a stronger smoothing ability while preserving the general edges. It is obvious that the requirements of transmission are not satisfied by the algorithms above, and these algorithms have high computational complexity.
This paper proposes a locally variable weighted 4-directional L 1 regularisation and wavelet fusion to optimise the transmission, and adopted an approximate solution to solve the 4-directional variational problem. In Figure 6(b), the x, y, x 45 , and y 45 four gradient directions are shown. Our energy equation is regularised from four directions at the same time. Compared with the previous method [16,17,[44][45][46][47][48], our smoothing and edge-preserving ability are both doubled in the same patch. Namely, our method has an edge-preserving capability and smoothing ability in the x, y, x 45 , and y 45 axes at the same time, which is conducive to processing irregular natural objects in foggy images. Our computational complexity is significantly lower than the ordinary variational problem [44,47] because an approximate algorithm is used. Finally, more clearly restored images can be obtained. To our knowledge, this is the first time that 4-directional L 1 regularisation has been applied in the image edge preserving filter and image defogging field, and it is the first time that an approximate method has been effectively used to solve the 4-directional variational problem.
Sakurai et al. [49] first proposed the 4-directional total variation for the anisotropic case, but a complete mathematical proof was not provided (it was admitted as such in [49]). Fan Liao et al. [50] provided the complete mathematical proof for both anisotropic and isotropic cases of the 4-TV model, and they adopted fast gradient projection (FGP) methods [51] to solve the 4-TV model. This method is still time-consuming because it requires many iterations, and its convergence becomes slower with the increasing iterations. This is a common phenomenon in the gradient projection method.
Unlike TV [44,[49][50][51] or context regularisation [29], our method does not need multi-iteration and adopts parallel computation so that it has a fast computational speed.

4-directional L 1 regularisation
First, we build the energy functional equation for the transmission map as follows: where subscript P denotes the location of one pixel in the transmission map,t P the final optimised transmission value, and t P is the input transmission. The data term, (t P − t P ) 2 , is a fidelity term preventing the input and output from deviating wildly. The second term, , is the regularisation term and can smooth the result by minimising the partial derivative oft P along the x, y, x 45 , and y 45 directions. Parameter can control the relative weight between the fidelity and regularisation terms. To optimise the transmission, we seek the minimum of Equation (7). In Equation (7), we used the wavelet decomposition and fusion for the input transmission.
Then t P is expressed as follows: where IWT is the inverse wavelet transform, l P and h P are the low-frequency and high-frequency coefficients, respectively. These coefficients can be obtained after the transmission map is composed by wavelet transform. Here, is the wavelet fusion coefficient, which we refer to as the residual fog factor, which will be introduced later. This wavelet fusion method clarifies the whole restored image and causes the colour to appear more natural. W P and W 45P are local varied weight functions as follows: where R(P) is the region centred by pixel P; r is any other pixel except the centre P in the R(P);‖ ∇t (r )‖ 2 and ‖∇t 45 (r )‖ 2 are, respectively, the L 2 norms of the gradients of the x, y directions and x 45 , y 45 directions at r point; controls the smooth level of the processed transmission map and will be explained in the next part, g P,r and is a Gaussian filter with standard deviation value = 1.
In Equation (7), our regularisation term has three characteristics.
1. It can smooth images heavily. Because the image has been normalised and the pixel value is at [0, 1] in the transmission map, the gradient value is much less than 1. We make W P or W 45P much larger than 1 in most cases by Equations (9) and (10). That is, Equation (7)  | P ) are smaller, there are no main edges in the local region R(P). Then, the weight function W P or W 45P will be larger, and the regularisation will strengthen the smoothness to remove textures in this region. When the partial derivatives are larger, the local image is considered to be in the main edge region. The weights W P and W 45P will become smaller, and the regularisation will tend to preserve edges. 3. Smoothing and edge preservation are performed simultaneously in the x and y directions and the x 45 and y 45 directions, so our proposed approach's smoothing and edge-preserving ability is greater than those of other filters [17,[44][45][46][47][48]. Our method is especially suitable for natural scene defogging. Equation (7) is a typical anisotropy TV problem. This paper takes the new fast approach to resolve this problem. The second term in Equation (7) can be expressed as | |t y 45 The first term in Equation (12) can be approximated as where G P,r is the Gaussian filter and * is the convolution operator. In the same way, the other terms in Equation (12) can be approximated as Thus, Equation (7) can be written as While Equation (17) can be written in the matrix form wheret and t are the vector forms oft P and, t P respectively. D x , D y , D x45 , and D y45 are the Toeplitz matrices from the discrete gradient. W p1 , W p2 , W x , W y , W x45 , and W y45 are the diagonal matrices in which the diagonal elements are the weights w p1 , w p2 , w x , w y , w x45 , W 45y . The minimum of this discrete Equation (18) is to set its derivative to zero, and it solves the following linear system. (19) is a 9-point spatially inhomogeneous Laplacian matrix, which is a large sparse matrix and its size is N × N (N = h × w. h and w are the length and width of the input image, respectively).There are several available methods [46,52,53] to solve this line system, but these methods are all complicated and time-consuming. Therefore, we use four 3-point Laplacian matrices to replace the 9-point Laplacian matrix. According to the current actual situation in which most video devices are equipped with multiple CPUs or even multiple GPUs, we utilise parallel algorithms to further considerably enhance the processing speed.
We decomposed Equation (17) into four equations along the x, y, x 45 , and y 45 directions.
arg miñ arg miñ To minimise Equations (20)-(23) is to solve the following matrix equations: where L tx = D T x W p1 W x D x , L ty = D T y W p1 W y D y , L tx45 = D T x45 W p2 W x45 D x45 , L ty45 = D T y45 W p2 W y45 D y45 , andt x ,t y , t x45 ,t y45 and t are the vector forms oft Px ,t Py ,t Px45,tPy45 and t P , respectively. The, L tx , L ty , L tx45 L ty45 are 3-point Laplacian matrices, and they are all tridiagonal matrices in which non-zero elements exist only in the diagonal, the right diagonal, and the left diagonal. For example, Equation (24) can be written as follows: In Equation (28),t n and t n are, respectively, the nth elements oft x and of t of Equation (22). Here, a n b n and c n are respectively: a n = L tx (n,n- n+1) . Such a matrix can be solved easily by using the Gaussian elimination algorithm with an O(N) complexity. Its computational complexity is much less than that of the large sparse 9-point Laplacian matrix. The sum of time to solve the four tridiagonal matrices is still much less than the 9-point Laplacian matrix.
To further enhance the computational speed and make use of the redundant CPUs or GPUs on current image devices, we adopt a parallel algorithm approach on this basis. In other words, the above Equations (24) to (27) are calculated on different processors at the same time, andt x ,t y ,t x45 ,t y45 are obtained simultaneously. Theset x ,t ytx45 ,t y45 are then used to establish the new energy equation.
In Equation (29), t * P is the final result and t P is the same as Equation (7). The minimum value of Equation (29) satisfies the Euler-Lagrange equation as follows: Equation (30) can then be rewritten in a discrete form.
In Equation (31), d x , d y , d x45 , and d y45 are the first-order differential operators in the x, y, x 45 , and y 45 directions, respectively. We take the fast Fourier transform (FFT) of Equation (31) and obtain the result T * P in the frequency domain. Finally, we perform the inverse fast Fourier transform to obtain the final result t * P . Figure 7 shows the comparison of transmission by different optimising methods. It can be seen that the edge-preserving ability of guided filtering [17] is limited, and the edges of different objects are fuzzy. When the depth of field of two objects varies greatly, it is easy to generate halo artifacts (residual fog) along the edges in the restored images. Meng's context regularisation [29] has a certain edge-preserving ability, but it is not strong enough for our purposes here. There is still residual fog where the depth of field of the two objects is far apart. Our method has a strong edge-preserving ability and the transmission in each object is evenly distributed. The intensity of optimised transmission is close to the original transmission and will not cause colour deviation.

Effects of parameters
Our method can be expressed as a complete equation: The other important parameter is in Equations (9) and (10), which we refer to as the smoothing factor. The degree of the smoothing transmission map is relative to the value of . Some examples are shown in Figure 9. We can see that the defogging result is affected by . These examples demonstrate that many textures in the transmission map will interfere with the final restored images, especially for foggy images with dense textures. This also confirms that our idea of optimal transmission is correct.
In Equations (24)- (27), the parameter is 0.05 and the restored images are clear.

Calculation of atmospheric light value
For the atmospheric light value, we adopted the method in [16]. We pick the top 0.1% brightest pixels in the dark channel map. These pixels are usually most haze-opaque. Among them, the pixels with the highest intensity in the input image are selected as the atmospheric light value.

4DL 1 R-NET
To further enhance the computational speed, we propose a deep learning network (4DL 1 R-net) to replace the 4-directional L 1 regularisation. The 4DL 1 R-net is a learning-based neural network architecture that smooths the transmission map while preserving the main edges, derived from the 4-directional L 1 regularisation algorithm.

Architecture of 4DL 1 R-net
The function of the 4DL 1 R-net is the same as that of the 4directional L 1 regularisation. The architecture of our proposed network is shown in Figure 10. This framework is roughly similar to U-net [54] and is improved in detail. These improvements are suitable for optimising the transmission for image defogging. It is an encoder-decoder architecture with skip connections. The encoder component is made of five blocks (yellow blocks from left to right in Figure 10) .There are three or two convolution layers in every block, and the red layer is the convolution layer whose output size is half of the input size. After the bottleneck convolution layer (block 5), the output size is reduced to 1/16 of the input size. The latter part is the decoder. The size of the image is gradually increased by deconvolution, and the final size is equal to its original size. The blue arrow and line represent the skip structure, which concatenates the deconvolution layers of the decoder and the corresponding convolution layers of the encoder. The 4DL 1 R-net has no pooling layers, so it can perform the pixel-wise regression very well. During training, we found that eliminating batch normalisation can greatly improve training results.

Loss function
The loss function for this problem considers the difference between the ground truth t and the predictiont . The loss L is the pixel-wise L 1 loss function, which is defined on the intensity between t andt .

Dataset
First, we randomly collected 398 pictures from other datasets (not the RESIDE dataset [34] or D-HAZY dataset [55]) and the Internet, the subjects of which were natural scenery such as streets, persons, cats, dogs, and other images with rich textures. Our dataset did not contain any of the defogging examples in this study. We used 79 of these images as a validation set. Then,

Implementation details
We implemented the 4DL 1 R-net using TensorFlow 2.0, and trained on an NVIDIA GeForce GTX 1080 Ti GPU with 11GB In all experiments, we used the ADAM [56] optimiser with a learning rate of 0.001. The size of the training sample was 256 × 256 and the batch size was set to 8. The total number of trainable parameters for the entire network was approximately 329 M. No data augmentation was required during the training. As shown in Figure 11, the loss function reaches a stable value after approximately 130 epochs on the validation set, and only takes about 30 min. The mean square error (MSE) values and loss function show similar development trends on the training set and the validation set. The mean absolute percentage error (MAPEs) of the training dataset and the validation dataset were approximately 30% and 85% after 300 epochs, respectively.

Processing result
For any image, the processing result of 4DL 1 R-net is as effective as the 4-directional L 1 regularisation algorithm. However, its processing speed has been further improved, and the specific experimental data will be shown below. Comparisons between the 4-directional L 1 regularisation algorithm and the 4DL 1 R-net are shown in Figure 12.

ANALYSIS OF DEFOGGING EFFECT
In this study, the experimental environment is a PC with Windows 7 flagship edition, a CPU with an Intel i7-9700 eight-core processor, and 16.0G memory. The languages are MatlabR2014a and Python 3.7.

Subjective comparative analysis of defogging effects
Our method is compared with several classic algorithms. In Figure 13, we show the restored images of these algorithms on typical outdoor natural foggy images.
Rows (b), (f), and (g) are DehazeNet, MSCNN, and AOD-Net, respectively, all based on deep learning. As we see, none of these methods work very well on natural foggy images. These algorithms have a common deficiency in that defogging is not thorough, and there is no obvious difference before and after  [31], (c) Meng's algorithm [29], (d) Retinex algorithm [7], (e) DCP algorithm [17], (f) MSCNN-net [32], (g) AOD-Net [35], and (h) ADCP+4-directional L 1 regularisation defogging for many dense foggy images. This phenomenon should be related to the training dataset. As mentioned in the first section, it is almost impossible to set up a dataset with a large number of natural foggy images and corresponding clear ones. Therefore, we believe that these current defogging methods based on deep learning cannot be applied in practical engineering, and the method combining traditional feature extraction and deep learning is a good choice.   In Figure 13(c), restored images of Meng's algorithm showed that this algorithm cannot handle areas of high brightness well. The train lights in the second column, the gooses in the third column, and the sky in the fourth column are too bright. There are halo artifacts around the close scene regions in the depth mutation (note the close-up leaves in the first column, row (c)). In row (d), restored images by the Retinex algorithm have higher brightness. This algorithm has a good effect on late afternoon and dim images. However, it cannot defog thoroughly for distance views. Further, there is residual fog, and the colour is seriously distorted. In row (e), restored images of the DCP algorithm generally look dark, although the details in the restored images are clear. It is obvious that our restored images are more natural than the other algorithms. The overall brightness is moderate, and the important details are very clear.
We have also compared various algorithms on an artificial dataset (RESIDE dataset [34]), and we can see that our algorithm also has clear restored images. In particular, for images with object edges and obvious textures, our restored images have the clearest details. In Figure 14, there are eight ground truth images and foggy samples to be processed by various algorithms. For example, in the first column, our restored result has the most highlighted building texture. Other methods cannot achieve such effects. For so many artificial foggy images, our resorted images have moderate brightness and the most natural colour. We can see rows (c) and (l), which are close to the ground truth (row (b) and row (k)). In Figure 14, we select outdoor images as examples because these images are better than indoor images in simulating foggy weather.

Validation of the robustness of proposed method
In this section, we validate the robustness of our method. In experiments, we found that the highest brightness of some artificial foggy images is close to 1. Such high brightness is much rare in natural foggy images. So the brightness of several restored images was adjusted while experiments were conducted.
In Figure 15, we show some natural foggy images and restored images. These images are often used by other researchers in this field. We can see that our restored images are all clear and natural, even for some dense foggy images.
As shown in Figures 16 and 17, we performed generalisation tests on the RESIDE dataset [34] and D-HAZY dataset [55]. After our processing, many details that should be invisible because of darkness can become obvious and clear. For example, there are some detailed comparisons in the red rectangle in Figure 16.
The foggy images were synthesised using indoor images in the D-HAZY dataset. In Figure 17, we show the testing results on the D-HAZY dataset. In general, our results are clear and natural. Because of the uneven indoor illumination, there is a slight colour distortion between our result and the ground truth. Perhaps this phenomenon is because our algorithm is based on natural image feature extraction. We believe this evidence indicates clearly that our method is more suitable for processing natural foggy images and outdoor artificial foggy images.

Objective analysis on artificial foggy images
We also performed an objective analysis of these artificial foggy images. We selected three parameters to compare the performance of the algorithms mentioned above. The first was entropy, which is the most common method to evaluate the overall information of an image. Moreover, PSNR and SSIM [23] were adopted in this study; the former is the abbreviation of Peak Signal to Noise Ratio and the latter is the abbreviation of structural similarity.
As shown in Table 1, we used three images in Figure 14: Street (the first column, a-i row), Building (the first column, j-r row), and Trees (the second column, j-r row). As for the entropy, the Retinex algorithm has the highest value, which should be related to the colour distortion of restored images. From the instances in Figures 13 and 14, we can see that the results of the Retinex algorithm are more colourful than the ground truth. Regarding the SSIM and PSNR values, the algorithms based on CNNs are significantly higher than the traditional methods. This is a feature of these data-driven algorithms because their restored images are close to the ground truth. The parameters of our method are lower than those of the deep learning algorithm but higher than those of traditional algorithms such as DCP, Meng, and Retinex. Therefore, we argue that it is unsuitable to compare traditional algorithms with deep learning algorithms only by those objective parameters. These deep learning algorithms cannot effectively process many natural foggy images, but their SSIM and PSNR values on artificial images are higher.

Comparison of processing speed
For an image with a size of 640 × 479, the speed of the ADCP+4DL 1 R-net is approximately 0.146 s, and the speed of AOD-NET* (without GPUs) is 0.65 s [35]. Our speed is the fastest among all the current algorithms without GPUs. As shown in Figure 18, the speed of the ADCP + 4DL 1 R-net is just lower than that of AOD-Net because AOD-Net uses GPUs during the defogging processing. As shown in Figure 19, the effects of the parallel algorithm are not as expected, and its speed is even lower than the speed of the non-parallel algorithm when the image size is smaller than 800 × 523. The reason for this phenomenon is that the input image is not sufficiently large so that it cannot reflect the advantage of the parallel algorithm in MATLAB. However, the time expended by the parallel algorithm does not change significantly as the size of the image increases. Therefore, the advantages of the parallel algorithm can be observed in processing largesize images. Deep learning networks have demonstrated their great power in Figure 19; the speed of 4DL 1 R-net is much faster than that of the other two algorithms. We believe that methods for the application of deep learning networks are an important development trend of image defogging in the future.

CONCLUSION
In this study, we analyse the shortcomings of the DCP algorithm and their reasons and propose improved methods. We use the ADCP algorithm to eliminate halo artifacts and darker restored images, thereby improving the colour of close-range white objects in the restored images. We use wavelet fusion, a locally variable weighted 4-directional L 1 regularisation method, and the parallel algorithm to optimise the transmission map. We trained the 4DL 1 R-net to further enhance the processing speed. Several experiments validate that our method is effective. Our restored images have high brightness and clear details and maintain natural image colours. There are still several disadvantages in our method. For images with lower RGB intensity, if the low-frequency coefficient of wavelet fusion is suppressed during transmission optimisation, the restored images will be darker. The fact that we do not use GPUs during defogging processing results in our processing speed being lower than that of AOD-Net; therefore, we plan to develop an algorithm for mobile GPUs and mobile operating systems in the future.