Unsupervised change detection method in SAR images based on deep belief network using an improved fuzzy C-means clustering algorithm

Deep learning methods have recently displayed ground-breaking results for synthetic aperture radar image change detection problem. However, they still face the challenges of intrinsic noise and the difﬁculty of acquiring labeled data. To sort out these issues, we aim to develop a change detection approach speciﬁcally designed for analyzing synthetic aperture radar images based on Deep Belief Network as the deep architecture which includes unsupervised feature learning and supervised network ﬁne-tuning. The deep neural networks can reach the ﬁnal change maps directly from the two original images. A pre-classiﬁcation based on Morphological Reconstruction and Membership Filtering is employed in order to minimize the effect of noise. Appropriate diversity samples are provided by a virtual sample generation method in order to mitigate overﬁtting raised by limited synthetic aperture radar data. Visual and quantitative analysis as well as comparisons with advanced algorithms show that our algorithm not only achieves better results but also requires less imple-mentation time.


INTRODUCTION
New challenges have risen in the analysis and interpretation of digital images due to the rapid progression of sensors and computing technologies. In the electronics field, image processing and machine vision techniques can help to identify and measure different targets in an image in order to extract useful information according to specific needs. Important applications in several fields have emerged where machine vision techniques overlap with image processing techniques, including video surveillance, remote sensing, medical diagnosis, military, and change detection.
Change detection is a key technology that aims to identify any changes in the same scene acquired at different times; it is still one of the most crucial and distinguished topics of research in the remote sensing field, as it is widely employed in many earth observation applications, including urban planning [1], agricultural survey, and disaster monitoring [2].
Synthetic aperture radar (SAR) has become a powerful remote sensing technology thanks to its ability to observe the earth's surface at all times and in any weather condition [2,3]. Recent advances in SAR imaging acquisition technologies have substantially increased the SAR monitoring potential; images are available covering a large area with a brief revisit time.
However, due to the hardships caused by the presence of speckle noise, which impedes the interpretation of SAR data and degrades the performance of change detection, change detection in SAR imagery is a more challenging problem than optical satellite imagery, with improved methods becoming necessary.
In general, the change detection methods follow one of two approaches, post-classification comparison [4] or postclassification analysis [5].
In the first approach, the change map is identified by comparing classification maps independently derived from the two temporally different SAR images of the same area. This method has the advantage of acquiring greater information about the categories that changed and also prevents the normalization of the radiation of remotely sensed images, thus reducing the external impact of atmospheric sensory and environmental differences between multi-temporal remote sensing images. However, a high accuracy requirement for the classification of each one of the two images is required.
The second approach is the most eminent. Based on a difference image, several approaches are presented. The approach first obtains the difference image (DI) generated via a comparative analysis of the two noise-contaminated multi-temporal images and then performs DI analysis to obtain a change detection results. However, the validity of the change detection results depends on the quality of the generated DI, which should consider the problem of noise suppression.
Ratio operators are usually employed as techniques to generate a difference image, such as log-ratio operator [6], which is seen as robust to the speckle noise, since it is considered to have the ability to transform the multiplicative noise to the additive noise, and the mean-ratio operator. Gong et al. [7] proposed a neighborhood-based ratio operator to acquire the difference image by combining spatial information and gray-level information.
The next step consists of analyzing the DI to extract the change region from the difference images. Two conventional methods have been widely used, which are thresholding and clustering methods.
In threshold methods, such as Ostu [8], Kittler-Illingworth (KI) minimum threshold algorithm [9], several methods were proposed to automatically find an optimal threshold which minimizes the probability of overall error in the change detection process. Among them, Bruzzone et al. [10] proposed a threshold method in which the difference image was modeled by a Gaussian distribution, and the Expectation-Maximization (EM) algorithm was employed to determine an optimal threshold. Xiong et al. [11] tackled the threshold selection issue by combining the features of two change detection measures through a Markovian approach.
The clustering methods are also increasingly applied in the change detection task and more research is allocated to developing powerful classifiers. Among the most popular clustering methods, we find the Fuzzy C-Means algorithm (FCM).
Although the aforementioned traditional pixel-based change detection algorithms are simple and fast, their accuracy may be impaired due to limited comparable classification features of the pixel, as a result, the problem becomes more aggravated in the presence of the image noise.
To overcome the limits of traditional techniques, many approaches have been proposed, aiming to interpret the change of pixels statistically by a mean of modeling local texture and contextual relationships. A series of classifiers have been used by approaches that employ discriminate models, such as support vector machine (SVM) [12], Spare Representation-Based Classification (SRC) [13], neural network (NN) [14], and random forest (RF) [15]. The effective feature extraction method plays a key role in these classifier's performances. Plenty of spatial feature extraction methods based on texture descriptors have been proposed for SAR images, such as, in statistical analysis-two statistics-based feature extraction methods have been used, the kurtosis wavelet energy (KWE), and the kurtosis curvelet energy (KCE) [16,17]. Note that KCE produces better results as an efficient feature for texture discrimination. In the transform domain analysis, Wavelet transform and Gabor filter are widely used to extract the features of SAR images. For the model-based analysis, Markov random field (MRFs) models are the most widely used, providing a practical efficient stochastic model for incorporating spatial contextual information in the classification process.
All the above-mentioned classification algorithms require ingenuity and prerequisite knowledge. In addition, they have not reached the promised classification performance. In recent years, much research work is devoted to developing powerful classifiers, such as, generalized linear regression model [18], POLWHE method [19], and hybrid MLPNN-ICA [20].
Deep learning is the most intelligent classifier widely used. Recently, Deep learning has ever-increasing popularity thanks to the encouraging results for many problems in speech recognition [21], image recognition [22], and especially classification [23,24], a new horizon, capable of extracting discriminative features. Various types of deep learning architectures exist, such as, deep neural network (DNN), Convolutional Neural Network (CNN), deep belief network (DBN), and convolutional DBN (CDBN).
Deep learning has been extensively applied in SAR image processing. Many deep learning techniques have been proposed. For example, Gong et al. [25] proposed a change detection method in the SAR image based on deep learning; the final change detection map is established directly from the original SAR images without the need to generate a difference image. According to the initial results obtained by FCM, a training set is designed by taking blocks with sliding windows on two original images, then a DNN used to learn high-order features and classify the SAR images. Motivated by the idea of using DBN to solve the change detection problem, a similar approach in [26] strives to train DBN better by increasing the amount of appropriate data using input images and the images acquired by applying morphological operators on them. Hou et al. [27] presented a change detection method that incorporates deep features and low-rank-based saliency computation. Sharifzadeh et al. [28] developed a hybrid algorithm based on a CNN and multilayer perceptron (CNN-MLP) for image classification, with improved performance. Wang et al. [29] proposed a change detection framework for hyperspectral remote sensing images based on End-to-end 2-D CNN (GETNET) in which the mixed affinity matrix that integrates subpixel representation is used to extract the discriminative features.
Based on the aforementioned, though various methods have shown impressive and promising results, several obstacles hinder the performance of SAR image change detection, such as computational expense, processing time, presence of specklenoise, which makes the interpretation of SAR images very difficult, and can lead to some false alarms when changed information is extracted, heavy dependence on the assumptions employed in modeling the difference image which has a direct impact on change detection performance, and the difficulty of producing labeled data. Finally, the existing deep networks used We aim to improve the results of the SAR change detection method based on deep architectures. On the one hand, we aim to better the training of network by increasing the suitable diversity samples, while on the other we aim at minimizing the noise effect. The flowchart of the proposed method is illustrated in Figure 1.
The proposed algorithm covers the aspects as follows: In the first step, the pre-classification method based on the morphological reconstruction and membership filtering is applied to the input images to generate some appropriate samples. The diversity of data plays an important role in training the neural networks, some virtual samples are also generated to improve the training of DBN. Then, a DBN is designed to train in a supervised way with appropriate samples and their labels which are obtained in the previous step. In the last step, a trained DBN is employed to classify each pixel into changed and unchanged classes in the final classification.
This paper is organized as follows. The following section will describe in detail the required knowledge and the proposed algorithm. The experimental results and analyses are reported in Section 3. Finally, the conclusion is provided at the end.

Fast and robust fuzzy C-means (FRFCM) algorithm
Despite the fact that the FCM algorithm gained much attention since its inception [30,31], it has some serious shortcomings, including its non-robustness to images with complex texture and corrupted background due to noise since it processes single pixels without incorporating the spatial information. To cope with its challenges, several approaches have been suggested.
The first improvement concerns the incorporation of spatial information in an objective function. Some classic FCM-related algorithms, such as FCM_S [32], FCM_S1 [33], FCM_S2 [33], EnFCM [34], and FGFCM [35], have been suggested. Mainly, Krinidis and Chatzis [36] present a novel robust fuzzy local information C-means clustering algorithm (FLICM) that incorporated a novel fuzzy factor into its objective function, aiming to guarantee noise-immunity and image detail preservation. Although FILCM algorithm has proved to have better performances than previous algorithms, its drawback comes with the adopted non-robust Euclidean distance that is not efficient for arbitrary spatial information of images.
A second improvement aims to overcome the problem by introducing robust kernel distances into the objective function, which gives rise to a series of kernel-based FCM algorithms as: KWFLICM [37], ARKFCM [38], KGFCM [39], NDFCM [40]. For instance, Zhao et al. [41] report a neighbourhood weighted FCM clustering algorithm (NWFCM) which replaces the usual Euclidean distance in the objective function of FCM with a neighbourhood weighted-distance.
Recently, Lei et al. [42] proposed a new algorithm based on morphological reconstruction and membership filtering aiming to obtain the best compromise between detail preservation and noise suppression.
The clustering FRFCM is performed on the gray level histogram of an image reconstructed by morphological reconstruction (MR). The introduction of the histogram helps to reduce the computational complexity of FCM.
MR was introduced before applying clustering in order to optimize a distribution characteristic of data.
Better than many habitual filtering operations, MR is capable of retention of image details and removal of noise without any prior knowledge of the noise type.
Generally speaking, MR includes two basic operations i.e. dilation and erosion reconstructions [43]. Based on their combination, Opening and Closing two morphological operations are obtained. Because morphological closing is more applicable for smoothing texture details, we used it to modify the original image.
The objective function is defined as follows: where 1≤ l ≤ q, q (it is generally far smaller than N) is the number of gray levels contained in ξ which is an image reconstructed by morphological closing reconstruction defined as: where R C denotes morphological closing reconstruction.
R D f (g) denoted the erosion reconstruction R f (g) denotes the dilatation reconstruction g marker image D and ε are realized by using a flat structuring element. kl represents the fuzzy membership of gray value l with respect to the k th cluster. m is a weighting exponent, l is the number of pixels whose ray level equals to , and k denotes the prototype value of the k th clustering.
We have to find those values of the parameters for which the objective function has the minimal value.
The problem of minimizing is solved by the Lagrange multiplier technique which can convert the problem to an unconstrained problem.
The Lagrange multiplier (λ) is used to find the saddle point of the following Lagrange function: The membership function and cluster centers are obtained by minimizing the objective function in Equation (1): According to (6), a membership partition matrix U = [ kl ] c × q is obtained. The iterative process is terminated when The introduction of local spatial information is similar to a membership filter which does not require computing the distance between the pixels within local spatial neighbours, which results in an acceleration of the convergence of our algorithm. We utilize a median filter Ú = med {U}.

Virtual sample
Training a DNN that can generalize well to new data is a challenging problem. The number of available training samples is usually limited, which results in poor performance of the network.
In order to solve this problem, we use a virtual sample method that tries to enrich the training samples, which results in reducing the overfitting and faster optimization of the network.
From two given samples of the same class, a virtual sample can be generated with proper ratio.
P i and P j are two training sample from the same class.Ṕ is virtual sample. For each virtual sample generated, i and j are uniformly distributed random numbers on the interval [0, 1], we attribute the same label of training samples to the virtual sample.
is the random Gaussian noise. The mean of β is set to 0, and the variance of β is set to 0.001.
In Chen et al.'s work [44], a similar virtual sample generation method was applied in the hyperspectral image classification.

Proposed methodology
Consider two co-registered intensity SAR images: with a size of M × N, acquired over the same geographical location at two different times t 1 and t 2 , respectively. The main purpose of change detection is to generate a change map, which identifies the change in each pixel at time t 1 and t 2 in the scene. This can be formulated as a classification problem where the final output is a binary image associated with changed and unchanged pixels.

Pre-classification by FR fuzzy C-means clustering
In order to prevent generating a DI, the proposed algorithm begins with a joint classification of the two input SAR images acquired over the same region but at different times.
To this end, we use a joint classification based on the hierarchical FRFCM algorithm. The pre-classification step selects the pixels that are best adapted to train the DNN.
The classification results consist of three classes The similarity matrix S ij is established from the two original images by a similarity operator, Sij ∈ [0,1] is defined as: where I i j represents the gray levels relating to two pixels at the corresponding positions (i, j).
Two threshold values of similarity T 1 and T 2 are used to define whether the two pixels studied are identical or show changes in relation to each other.
Then, both variances 1 i j and 2 i j are established for each of the two corresponding pixels in the two images.
According to the principle of minimum variance, if 1 i j ≤ 2 i j , I 1 i j will be classified by the FRFCM algorithm.
In the next step, the similarity measure image is compared with the threshold.
If S i j < T 1 , this means that no change took place Ω 1 i j = Ω 2 i j . If S i j > T 2 , this means that a change has occurred and the pixel in the second image will be classified.
The classification results are represented by The class Ω c contains pixels that have a high probability of being changed, pixels belonging to class Ω u have a high probability to be unchanged. Ω i represents the uncertain class. It will then be classified by DBN.

Sample generation
In SAR change detection tasks, there is a big gap between the number of pixels in the changed and unchanged classes, and that has a great impact on training DBN. Training DBN with a large number of selected samples from the unchanged class leads to an increase in the time and causes overfitting.
A high-quality training set with a balanced number of samples of each class is requested.
To avoid this, we generate virtual samples by the aforementioned method with the aim of enriching the training samples in the unchanged class.

Classification by using DBN
DBN is a probabilistic model that was formed of one input layer of observations, one output layer of reconstructions of the input data, and several hidden layers. The units of each hidden layer target to learn the feature representation of the input data (observation) at different abstract levels [46]. One of the keys components of training DBN (DBN) is that each pair of adjacent layers is treated as a restricted Boltzmann machine (RBM), stacked together in such a way that the output of the previous layer becomes the input layer for the next training set.
The training process of DBN splits into two phasespre-training (unsupervised phase) and fine-tuning (supervised phase).
In the first phase, the pre-training tries to approximate parameters greedily. Each RBM has trained apart, in such a

FIGURE 2
The farmland of the Yellow River Estuary dataset (a1) image acquired at t 1 , (b1) image acquired at t 2 , and (c1) ground truth way as the lower RBM layer learns a set of low-level representation characteristics and then, the learned characteristics are used as input for the formation of a higher level RBM which should learn more complex subject characteristics. This process is repeated several times to learn a DBN.
In the second phase, the fine-tuning process, parameters of all layers are fine-tuned by the backpropagation.
The process of changed and unchanged pixel classification starts with extracting the image patches centered at selected pixels (belonging to Ω u , Ω c ), from the original SAR images. The real samples and virtual samples are used together as training samples. The neighborhood features of each pixel in image 1 and its corresponding features in image 2 are converted to vectors. Both vectors are concatenated to form a vector which will be delivered to the DBN as input.

Final classification
After training the DBN based on reliable samples, pixels belonging Ω i are further classified into changed and unchanged classes. We get the final change map by combining together the classification result and the pre-classification result.

Experimental setup
In this section, three real SAR image datasets have been used to evaluate the performance and effect of our algorithm. Table 1 shows the details of every dataset used with the enclosed spatial size, the imaging location, and the acquisition time, and the main reason for the change in the three images. The raw images of these three data sets and their ground truth are shown in Figures 2-4.
The performance evaluation of change detection methods is crucial.
FP denotes the number of pixels belonging to the unchanged class which is classified incorrectly. N c and N u are the numbers of pixels belonging to the unchanged and the changed class, respectively.
It should be noted that the Kappa index is the most convincing coefficient in change detection, it measures the differences between detection result and the reference image.
It ought to be noted that simulation is developed by using MATLAB R2016a with hardware specifications of 12-GB RAM and Intel core I5 2.50-GHz CPU.

Experimental results
As mentioned earlier, the first step in our execution is a preclassification carried out by the FRFCM which demands a morphological closing reconstruction; a structural element of the disk with radius 3 is employed in all datasets. After that, both real and virtual samples are combined together as training samples, which are randomly selected from Ω c and Ω u .
Neighborhood size in feature extraction from the input SAR images r is an important parameter that can also affect the final results of change detection.
As shown in Figure 5, if neighborhood size increases, the change detection tends to worsen. Therefore, we define r = 5 in our implementations.  Now, the selected data have to be entered into the DNN. We used a 5-layer network with 50, 250, 200,100, and 1 nodes for each layer. The batch size of the network equals 100 and training iteration was specified as 50.
After training, the neighborhood features of pixels belonging to Ω i are further classified into changed and unchanged classes. Finally, we combine the DBN classification result and the preclassification result together to form the final change map.

3.3
Analysis of the results

Effect of virtual samples
We first test the effect of virtual samples on the final change detection. As illustrated in Table 2, the Kappa values are improved by 0.60%, 0.09%, and 0.38 % on the three datasets, respectively. DNN with virtual samples is proving to be a successful tool for change detection tasks. Figure 6 shows the change detection results of the farmland of Yellow River Estuary set achieved by different methods from which we can conclude that the proposed method has a good characteristic to resist speckle noise; this is consistent with Table 3, where the OE and PCC and Kappa are high than others, the Kappa value of the proposed method on this  dataset is improved by 9.47%, 4.54%, over NBRELM, Deep-JFCM, respectively. In addition, Figure 5(a) shows some details that the detected regions miss, which are indicated by the red rectangle.

Comparison of the proposed algorithm with other methods
The final change-detection map we get by our propose methods is much nearer to the ground-truth image.
As for the Bern dataset, as is shown in Figure 7, we can see the presence of noise speckle in the final change detection images generated by DeepJFCM while NBRELM is less sensitive to noise but it can be noted that some details of changed regions are missing. From the results of Table 4, our proposed method is the best among all, it has given the best results from PCC and Kappa (PCC = 99.69, Kappa = 99.36).   According to the visual analysis of the final change detection map generated by different methods in the Yellow River dataset shown in Figure 8. We can conclude that compared with the other methods, the proposed method is less sensitive to noise, which translates into a reduction of the average total error, OE/%, and improvement in the values of PCC and Kappa coefficients, as shown in Table. 5.

Comparison of the running time
As mentioned above, we used a pre-classification founded morphological reconstruction and local membership filtering to accelerate the process for selecting a high-quality training set. A virtual sample method was also used, which results in reduced overfitting and faster optimization of the network. Table 6 shows that the proposed method needs less time for implementation.

CONCLUSIONS
The current study demonstrates a method of detecting image change directly from the original SAR images; it is an unsupervised method based on a DNN and does not require any prior knowledge, different from existing methods that generate a DI. By comparing it with other related methods, we come out with the conclusion that the proposed method is the most efficient one.
Our two main objectives were to reduce the impact of the presence of speckle noise and also the processing time. On the one hand, a high-quality training package was selected by introducing a fast and robust pre-classification based on a morphological reconstruction and filtering of local members, thus, preventing the network from producing a lot of redundant functionalities, and on the other hand, a virtual sample generation method that tries to enrich the training samples was used, which results in a reduction of overfitting and a faster optimization of the network.