Sparsity Based Autoencoders for Denoising Cluttered Radar Signatures

Narrowband and broadband indoor radar images significantly deteriorate in the presence of target dependent and independent static and dynamic clutter arising from walls. A stacked and sparse denoising autoencoder (StackedSDAE) is proposed for mitigating wall clutter in indoor radar images. The algorithm relies on the availability of clean images and corresponding noisy images during training and requires no additional information regarding the wall characteristics. The algorithm is evaluated on simulated Doppler-time spectrograms and high range resolution profiles generated for diverse radar frequencies and wall characteristics in around-the-corner radar (ACR) scenarios. Additional experiments are performed on range-enhanced frontal images generated from measurements gathered from a wideband RF imaging sensor. The results from the experiments show that the StackedSDAE successfully reconstructs images that closely resemble those that would be obtained in free space conditions. Further, the incorporation of sparsity and depth in the hidden layer representations within the autoencoder makes the algorithm more robust to low signal to noise ratio (SNR) and label mismatch between clean and corrupt data during training than the conventional single layer DAE. For example, the denoised ACR signatures show a structural similarity above 0.75 to clean free space images at SNR of -10dB and label mismatch error of 50%.


I. INTRODUCTION
Several types of urban radars have been researched and developed for civilian and military applications such as law enforcement, search and rescue, biomedical applications related to elderly monitoring and assisted living [1], [2], [3], [4], [5], [6]. The primary objectives of most indoor radars are human detection and localization. Moving humans are detected based on the Doppler modulations that are introduced to coherent radar transmit waveforms. The movements of the limbs of the human introduce micro-Doppler features which are captured through single dimension Doppler-time spectrograms using joint time-frequency transforms [7], [5], [8], [9], [10]. When the continuous wave radar is augmented with array processing, we obtain Doppler enhanced images along azimuth [11] or both azimuth and elevation [12], [13]. Alternatively, broadband pulse, linear frequency modulated or stepped frequency radars use fine downrange resolution to detect and track human motions along the range dimension [14], [15], [16], [17]. The resulting radar signatures are either single dimension high range resolution profiles (range-time signatures) [18] or rangeenhanced higher order plots [19], [20]. Human activities are Authors are with the Indraprastha Institute of Information Technology Delhi, New Delhi 110020 India. E-mail: {shobha@iiitd.ac.in detected and interpreted on the basis of their micro-Doppler and micro-range features [21], [22], [23], [24], [3].
Indoor radars could either be deployed in line-of-sight (LOS) environments (such as for fall monitoring of the elderly) or in non-line-of-sight (NLOS) environments (for security and surveillance purposes). The two most common NLOS deployments are the through-wall radar (TWR) [1], [2], [25] and the around-the-corner (ACR) radar [26], [27], [28], [29], [30]. However, in both the cases, the quality of radar signatures are greatly impacted by complex propagation artifacts introduced by walls such as attenuation and multipath clutter [31], [32], [33], [34]. We broadly categorise indoor clutter into two typestarget independent, and target dependent static and dynamic clutter.
Target independent static clutter arises from the reflections of radar signal off the lateral walls, ground, ceiling and other furniture in the rooms. While the target dependent static clutter is generated through reflections and refraction of signals from the target to side and back walls, reverberations within the front wall resulting in ghost targets and defocusing of targets. Several research efforts have been devoted to mitigate these artifacts on the radar signatures [35], [36]. The authors in [19] and [20] used back-projection and sparsitybased change detection algorithms, respectively, to track slowmoving humans in the range-crossrange space in the presence of target-dependent static clutter. These techniques relied on the availability of prior information of the wall geometry and characteristics in the indoor scenarios. Alternatively, authors in [37], [38] adopted the multipath exploitation strategy wherein the wall and target returns were projected on higher order subspaces based on their sparsity based representations. Then, the wall effects were removed from the target returns. While this is an effective strategy for removing target independent static clutter, it cannot be used for target dependent clutter since the target and clutter returns are no longer independent of each other. Dynamic clutter, on the other hand, arises due to the presence of other movers in the channel (target independent dynamic clutter) or due to the interactions between the dynamic target and the channel (target dependent dynamic clutter). The former can be separated on the basis of the micro-Doppler returns [39]. The latter is relatively difficult to remove as the target returns are not independent of complex wall propagation phenomenology.
Besides clutter, indoor radar signatures are also affected by noise and interference. Indoor radars typically operate below X band frequencies in order to enable the radar signal arXiv:2101.12445v1 [eess.SP] 29 Jan 2021 penetration through the wall materials. These radars are federally mandated to transmit low powers to limit the possibility of electromagnetic interference with other wireless devices. On the other hand, these indoor radars may also encounter significant interference from neighboring wireless systems in the environment such as WiFi. Therefore, successful radar detection of targets in these circumstances, rely on effective clutter, noise and interference management strategies. In this paper, we propose to use denoising autoencoders (DAE) to recover high quality radar signatures similar to free space signatures even under low signal to clutter and noise ratios (SCNR).
DAE are neural networks with two stages -an encoder and a decoder as shown in Fig.1a [40]. During training, the encoder in the network is trained to represent a noisy image (x) with a hidden layer (Z). Then, the decoder is simultaneously trained to recover a clean / denoised image (x) from the hidden layer that resembles a ground truth clean image (x). During test, the encoder is fed with noisy images, and clean images are gathered at the output of the decoder. The DAE has been widely applied in many different fields such as computer vision [41], [42], anomaly detection and natural language processing [43], [44], [41], [45]. In [34], we demonstrated the usefulness of DAE for removing dynamic clutter from through-wall frontal radar images. The algorithm demonstrated robust denoising and clutter mitigation performance for diverse wall and target conditions. However, the good performance of the network was predicated on two assumptions. First, the availability of a large volume of correctly labelled clean and noisy images during training. Second, the signal to noise ratio (SNR) of the radar system is high. However, in real world scenarios, both of these assumptions are often violated. For example, it may be nearly impossible to exactly replicate the same motion of the target in free space and then in the indoor channel conditions. Similarly, the channel may be plagued by high noise and interference, as discussed earlier. Therefore, in this paper, we propose to use a stacked DAE with sparsity constraints on the hidden layer representations [46], [40], hitherto referred to as the StackedSDAE. In other words, instead of a a single hidden layer, we propose a cascade of hidden layers, as shown in Fig.1b, where the representations are generated from sparsity based constraints. The motivation of the approach is that such deeper representations enable the network to capture higher order abstractions in the clean and noisy images. The incorporation of these additional hidden layers increases the computational time complexity during training but there is a significant reduction during test due to the feature size reduction within the hidden layers. Preliminary studies with the StackedSDAE were presented in [47]. In this paper, we offer more comprehensive experimental evaluation of the proposed algorithm.
The proposed StackedSDAE can be used for any type of radar signature where corrupted and corresponding clean images are available. In this paper, we have tested the proposed algorithm on different types of narrowband and broadband radar images of a dynamic human. First, ACR signatures are simulated using a combination of electromagnetic modeling using a full wave solver and animation models of humans. Radar images are generated for different carrier frequencies and diverse electrical characteristics of the wall. The narrowband data are processed with short time Fourier transform (STFT) across the time domain to obtain Dopplertime spectrograms. The wideband data are similarly processed with Fourier transform across the carrier frequency bandwidth to obtain high range resolution profiles (HRRP). In the second scenario, we generate range enhanced frontal images of humans using wideband radar data augmented with twodimensional array processing using an off-the-shelf sensor called the Walabot [48]. All of these images are corrupted by clutter and noise and the StackedSDAE is evaluated on them. Our results, across all three types of images, show that the StackedSDAE can considerably mitigate the clutter and distortions introduced by the walls even with high labelling mismatch error and under low SNR.
To summarize, the main contributions of our paper are the following. First, we propose StackedSDAE for mitigating clutter and distortions introduced by different types of wall phenomenology. The framework of the proposed algorithm can be used to denoise any type of distorted radar signature. Second, we have performed experimental validation on three different types of radar signatures -the Doppler time spectrogram, the HRRP and the range enhanced frontal images. For this purpose, we have generated a large database of these radar signatures in diverse channel conditions and radar frequencies. The complete database and algorithms are shared with the research community on the following URL https://rb.gy/mmhzf6. Third, we have shown that the proposed algorithm is particularly effective under low SNR conditions and when there are large errors in the labeling of the training data.
Our paper is organized in the following manner. In section II, we discuss the theory of StackedSDAE for clutter mitigation of radar images. In the subsequent section, we describe the experimental data generation through simulations followed by the simulation results in section 4. In section 5, we describe the measurement data collection followed by the denoising results from the autoencoder. We conclude with the analysis of the results in the final section.
Notation: In this paper, we represent scalars and column vectors by lower case letters and matrices by upper case letters.
II. THEORY When radars are operated in highly cluttered environments, significant distortions arise in the images. The DAE is a neural network that can be trained to remove the clutter artefacts from these images. The autoencoder requires both clean and noisy images while training. In the case of radar, the clean images correspond to radar images of the target in free space or some environment free of the clutter artefacts. These are denoted by X ∈ P ×Q where P is the pixel size of each of the Q images. In other words each image, x, of P pixels is vectorized and then Q such images are stacked column-wise to generate X. The corrupt data,X ∈ P ×Q are corresponding images of a similar target undertaking a similar motion, gathered in cluttered environments. In this section, we describe the DAE and two variants, the sparsity based DAE -termed as SparseDAE -and the sparsity based stacked DAE -termed as StackedSDAE. Our hypothesis is that these variants will outperform the conventional DAE in clutter mitigation.

A. Denoising Autoencoder (DAE)
We begin with a description of the conventional / standard single layer DAE framework, shown in Fig.1(a). Here, the algorithm learns a weight matrix, W 1 ∈ l×Q , in order to representX tr with a compressed Z ∈ l×Q through The number of nodes, l, in the weight matrix are fewer than the original pixel size of the image. In the decoder, Z is mapped back to the reconstructed clean image,X tr through where W 2 is the weighting function. In both the encoder and decoder, the same activation function φ, is used which could be either linear or a non-linear. Some of the popular non-linear activation functions in literature are hyperbolic tangent and sigmoid [49], [50]. The objective of the algorithm is to learn W 1 and W 2 from the training data such that the normalized mean square error between X tr andX tr is minimized as shown in min Equation (3) is a complex optimization problem which is NP hard to solve. Therefore, we introduce a proxy variable, Z, as shown in Then we relax the equality constraint in the formulation using an augmented Lagrangian, λ, in The regularization parameter, λ, in the above expression trades off between the error in the encoder (second term) and the decoder (first term) stages. The above formulation has a close form solution using an alternating direction method of multipliers (ADMM) [51]. In the following section, we will describe the implementation of the ADMM in greater detail. Once trained, the DAE is ready for test. During test, a denoised radar imagex test is recovered from the DAE with the test noisy imagex test as input using (2). The denoised image should now resemble the ground reference clean image x test .

B. Sparse Denoising Autoencoder (SparseDAE)
A modified autoencoder framework can be derived by imposing additional sparsity constraints on the hidden layer representations (Z) while learning the weighting matrices in the encoder and decoder. The objective function in (5) where an l 1 norm has been imposed on Z through a second regularization parameter µ. The objective function is solved through ADMM by separately solving for W 1 , W 2 and Z through iterations. First, W 1 is obtained from the closed form least squares solution for Then W 2 is similarly solved using least squares in Then both W 1 and W 2 are used to solve for Z in using iterative soft thresholding algorithm (ISTA) [52]. We update the network weight W 1 , W 2 and proxy variable Z, iteratively until the algorithm converges. Once the network is trained, the weight matrices W 1 and W 2 are used obtain a denoised formx test of the corrupted test datax test usingx The hypothesis is that when the autoencoder is properly trained the error after denoising (AD) betweenx test and x test , is lower than the error before denoising (BD) betweenx test and x test where x test is the corresponding ground truth clean image.

C. Stacked Sparse Denoising Autoencoder (StackedSDAE)
In this framework, the single hidden layer representation within the autoencoder is converted to multiple stacked layers as shown in Fig.1b. There is a vast body of research that have demonstrated that additional deeper layers in a neural network enable capturing of higher order abstractions in the data resulting in significant improvement in the performance of the algorithms. This is because the successive layers result in reuse of key features within the images as well as extraction of higher order features. The number of the layers and the number of nodes within each layer are typically heuristically chosen. In our work, we implement the StackedSDAE using three hidden layers. Therefore, instead of learning just two weighting matrices as was the case of the shallow DAE, our objective, here is to learn W 11 , W 12 , W 21 and W 22 . Each succeeding deeper layer is characterized by fewer number of nodes. As a result, the computational time complexity increases during the training phase since there are greater number of training matrices to learn. However, the complexity during the test phase reduces because of the reduced feature dimensions of the stacked layers.
Again, we divide our denoising problem into the training and the test stages. During training, the denoising problem can be formulated as Again the problem is NP hard since it is non-convex. Similar to SparseDAE, we use the variable separation technique by introducing proxy variables Z 2 , Z 1 and Z 0 such that Upon relaxing these constraints with augmented Lagrangian, the objective function now becomes Again, we use the ADMM technique for solving the above formulation. We separately solve for W 11 , W 12 , W 21 and W 22 , using closed form expressions for least squares, as shown below Then using the ISTA algorithm and the weighting matrices, we solve for Z 0 , Z 1 and Z 2 based on the following objective functions: Equations (16) to (22) are iterated till the algorithm converges.
Once the network weights are trained, we use them to reconstructx test from the corruptedx test bỹ Note that the StackedSDAE algorithm is significantly faster in generating denoised images at test time as it involves only a simple product operation with reduced feature dimensions than DAE and SparseDAE. This makes the algorithm suitable for real-time applications where training is usually done apriori.

D. Evaluation Metrics
The objective of the DAE and its variants are to reconstruct radar images that resemble those that would be obtained if the target were to move in free space conditions. Therefore, one obvious metric is the normalized mean square error between the reconstructed image (x test ) which is obtained after denoising and the ground truth free space image (x test ). However, in the image processing community, other metrics are preferred to NMSE since NMSE does not compare the salient features between different images. In this paper, we use the structural similarity index (SSIM) which is a popular metric that assesses luminance, contrast and structural differences between two images [53]. Its value ranges from 0 to 1 where 1 is obtained when the images are identical. We calculate SSIM between the cluttered imagex test and x test before denoising (BD). Then the SSIM is calculated betweenx test and x test after denoising (AD). The hypothesis, here, is that the SSIM will approach unity after denoising.

III. SIMULATION METHODOLOGY
We test the denoising algorithms on indoor radar signatures obtained in NLOS scenarios where there is both noise and clutter. We specifically consider the ACR scenario for simulations study as described below.

A. Simulation Models
The ACR simulation set up is shown in Fig.2. The electromagnetic wave propagation from the radar is modeled using two-dimensional finite difference time domain (FDTD) simulations in the XZ space. The simulation space consists of two corridors of 2m width arranged in a T-shape as shown in the figure. The walls are assumed to be 20cm thick. The simulation space is discretized to form uniform grid cells that are a tenth of a wavelength (λ c ) of the carrier frequency (f c ). All the regions outside of the walls are assumed to be of free space. Stochasticity is introduced in the electrical characteristics of the walls. Each grid cell within the wall has a dielectric constant that is drawn from a normal distribution, N ( r , std r ), of mean r and a standard deviation of std r . Similarly, the conductivity of each grid cell is drawn from the normal distribution of N (σ, σ std ). Therefore, the walls are not truly homogeneous since each grid cell has slightly different electrical characteristics to model real world conditions. The simulation space is bounded by a perfectly matched layer that is 2λ c thick. The source excitation which models the monostatic radar is located at (0.5, 0)m. Two types of source excitation are considered. The first is a narrowband source modelled as a sinusoidal source of f = f c frequency. The second is a broadband source excitation (f = f c ± β 2 ) which is modelled as a Gaussian signal modulated by the sinusoidal carrier signal at f c . The width of the Gaussian signal determines the bandwidth (β) of the source excitation. The time-domain simulations are allowed to run long enough to ensure that steady state conditions are reached and the mean and standard deviation of the time-domain electric field at every point in the simulation space are saved [54]. Based on the normal distribution, multiple realizations (η = 1 : M ) of the time-domain electric field at each two-dimensional grid position, ρ, are generated. Each of these η electric field vectors are then transformed using Fourier transform to the frequency domain and complex responses, H( ρ, f, η), at f = f c ± β 2 , are saved.
Next, we consider the human moving along the tangential trajectory before the radar, as shown in the figure, over a duration of T seconds. The human is a three dimensional figure with the height along Y axis. The skeleton framework of the human and the animation motion of the body parts are described using motion capture data from Sony [55]. Then the electromagnetic radar scattering off the human are modeled using the techniques described in [56]. We briefly describe the technique here. The human is considered to be a collection of B discrete point scatterers corresponding to different body parts each of a b reflectivity. Each of the body parts is modelled as an ellipsoid whose radar cross-section (a 2 b ) is obtained with analytical expressions. The time-domain radar returns from the human, corresponding to each η stochastic FDTD realization at frequency f , are obtained by Here r b (t) and ρ b (t) are the time-varying three and twodimensional Euclidean distances of the b th point scatterer from the radar respectively. The two way propagation physics from the radar to the point scatterer is captured by the square of the wall response H. Since the FDTD is a twodimensional simulation with an infinite line source excitation, the exponential phase term in (24) corrects the circular phase front from the two-dimensional FDTD propagation physics to the spherical phase front in the three-dimensional scenario. A in the above expression calibrates the amplitude of the FDTD source excitation to desired radar equivalent isotropic radiated power.
The time-domain radar data, s rx (t, f, η), could be narrowband or wideband. In the case of narrow-band data (where β = 0), the short-time Fourier transform is applied on the data to obtain M Doppler spectrograms, x DT , as shown in

B. Simulated Radar Signatures
Narrowband Doppler-time Signatures: The above process is carried out independently for three radar carrier frequencies f c : 2.4GHz, 5GHz and 10GHz -and the radar bandwidth is set at 0Hz. For each of the above cases, we consider three different types of walls. A wall with low mean conductivity σ = 0.05S/m, medium conductivity (σ = 100S/m) and high conductivity (σ = 1e5S/m). The mean dielectric constant is fixed at r = 4. The standard deviation for both dielectric constant and conductivity are fixed at 30%. Twenty stochastic realizations of the time-domain electric field are generated for each of the three cases using the stochastic FDTD solver. These realizations are combined with the human walking motion to generate ACR Doppler-time signatures. A short time window of 0.1s is used to generate the spectrograms. The total duration of the human motion is 6s. This interval is separated into 8 consecutive intervals of 0.75s duration. The sampling frequency of the time domain data is 500Hz resulting in Doppler frequency axes spanning from f D = −250Hz : +250Hz in all the images. Then complex Gaussian additive noise, N (0, N p ), of N p noise power are added to each pixel of the images to realize 200 images. Therefore we have a total of 1600 images for each of the three wall cases.
We show the wall propagation effects on the images at 2.4GHz in Fig.3. The top row on the figure shows the results generated in free space conditions in the absence of clutter and noise. These are generated when the human walks in the presence of radar but without any walls. The resulting micro-Doppler spectrograms across 8 time intervals from 0 to 6 seconds are shown. We observe that the Dopplers are low since the human motion is tangential with respect to the radar. As the human approaches the radar, from 0s to approximately 3s, the Dopplers are positive. Then, when the human moves away, the Dopplers become negative. We observe weak micro-Doppler returns from the other body parts. The second row shows the returns when the human is walking in the presence of low conductive walls. In this scenario, the walls allow the signal to penetrate with some attenuation. Hence, the strength of the signals are weaker. Due to the multipath introduced by the ringing of the radar signal within the wall, we observe a lot of multipath. From this figure, it becomes difficult to know whether there are one or more targets moving and whether they are coming towards or away from the radar. The third and fourth row show the micro-Dopplers from a medium and a high conductive wall. Due to the lossy nature of the walls, the through-wall propagation is blocked. Instead, the dominant phenomenon here is the reflections off the lateral walls which give rise to high strengths in the radar scattered signal. There is lesser micro-Doppler spread, in these cases. However, we do observe some negative Dopplers due to multipath, even when the target approaching the radar (the first few columns).
Next, we present the results for the 5GHz carrier in Fig.4. The figures show a slightly lower strength compared to the results from 2.4GHz due to the antenna gain offset between the two frequencies. The higher carrier frequency results in finer Doppler resolution. As a result, we are able to discern distinct micro-Doppler tracks from the different body parts in the free space scenario, shown in the top row. The low conductive walls give rise to significant through-wall propagation resulting in micro-Doppler spread and radar signal attenuation. This results in the low clarity spectrograms in the second row. Again, the third and fourth row show that the through-wall propagation has been blocked. However, multipath reflections off the lateral walls give rise to negative Dopplers even when the target Doppler is positive with respect to radar.
In the Doppler-time spectrograms corresponding to 10GHz, in Fig.5, we observe well resolved micro-Doppler tracks from the different body parts in the free space scenario. Due to the low sampling frequency we also observe some aliasing at negative frequencies even for the free space scenario (first few figures along top row). The wall distortions are again considerable due to both through-wall propagation effects (for low conductive walls) and due to multipath off lateral walls (for the high conductive walls).
Broadband Doppler-time Signatures: We simulated broadband radar data of 2GHz bandwidth about the three carrier frequencies using the FDTD solver. This results in a range resolution of 0.075m and the maximum unambiguous range, based on the frequency step size, is 10m. Complex Gaussian noise was added to the pixels of the resulting HRRP images and a total of 1600 images were generated for each wall type. We first discuss the HRRP obtained at 2.4GHz in Fig.6. Again, the total duration of the target motion is 6s and divided into 8 intervals of 0.75s each. The top row shows the target motion in free space conditions (in the absence of walls). We observe the range of the target changing only slightly as the human is moving tangentially across the radar's field-of-view. We are able to observe fine micro-range tracks arising from the motion of the limbs. However, the images significantly deteriorate in the presence of the walls due to multipath. In the case of the low conductive wall (second row), the ringing of the signal through the wall gives rise to multipath but also attenuates the radar signal. For high, conductive walls, the multipath arises due to reflections off the lateral walls which cause the radar received signal strength to increase. We also observe considerable aliasing in these scenarios.
Similar phenomena are observed in the HRRP for 5GHz and 10GHz, which are shown in Fig.7 and Fig.8. Again, the top row in both these figures show the HRRP when the human is walking in free space conditions where there are no walls and no noise. The second, third and fourth rows show the results when the walls are of low, medium and high conductivity respectively. The HRRP show a lot of similarity across the three carrier frequencies. This is mainly because the HRRP features are a function of the range resolution and the bandwidth of the radar which are identical across the three cases. The results from 10GHz show the greatest distortions and clutter.

IV. SIMULATION RESULTS AND ANALYSIS
In this section, we compare the similarity of the denoised / reconstructed images obtained from DAE, SparseDAE and StackedSDAE with respect to the clean ground truth images through the SSIM metric. The performances are evaluated for both types of radar signatures -the Doppler-time spectrograms and the HRRP -that were discussed above. We consider two parameters for comparison -the labelling mismatch error and the SNR, which is the ratio of the minimum signal receivable by the radar to the mean noise floor. In real world conditions, it may be impossible to exactly replicate a target motion in free space and ACR conditions during training. Therefore, the training data may have significant mismatch between the clean X tr and the noisy and clutteredX tr . We model this training error by shuffling the row entries of each column of X tr such that they no longer exactly correspond to the entries in X tr . We used 70% of total images as the training data set, and the remaining 30% as the test data set. We present the convergence of the objective function (shown in 5) with the number of iterations in Fig. 9. We then change the labelling mismatch error percentage from 0 to 60% by changing the degree of shuffling. Next, we change the SNR of the images from −15dB to +20dB by changing the Gaussian noise power N p that is added to each pixel of the image. Fig.10 shows the results of the three algorithms for different labelling mismatch errors across the three carrier frequencies at SNR of −10dB. In each case, the SSIM after denoising (AD) for all three algorithms -DAE, SparseDAE and StackedSDAE -are significantly improved when compared to the noisy and cluttered images before denoising (BD). As the label mismatch error increases, the SSIM for all three algorithms fall. However, the fall is much lower for the StackedSDAE. Even with 50% labelling mismatch error, the SSIM of the reconstructed images are at 0.8 at 2.4GHz (Fig.10a) and 5GHz (Fig.10b). However, the 10GHz scenario is a far more challenging case, where we observed a lot of multipath from lateral walls and weaker signals due to antenna gain offset. However, even here, the SSIM of the StackedSDAE is better than the SparseDAE and DAE at high labelling errors.

A. Time-Frequency Spectrograms
Next, we consider the effect of SNR on the three algorithms in Fig.11. Again, there is significant improvement in the SSIM after denoising for all three algorithms across all three frequencies. We observe that both the SparseDAE and the StackedSDAE are more robust to noise than DAE at low SNRs (from −15dB to 5dB) at 2.4GHz (Fig.11a) Fig. 5. Doppler-time spectrograms of human walking in ACR scenario with monostatic narrowband radar operating at 10GHz. Again, the signals are weaker here due to the antenna gain offset. The y-axis across all figures shows Doppler frequencies spanning from -250Hz to +250Hz. The x-axis across each figure is of 0.75s duration with a total time duration of 6s across the 8 columns. The dynamic range of each figure is from -20 to -70dB. The first, second, third and fourth rows show the radar signature of human walking in free space, low conductive, medium conductive and high conductive wall conditions, respectively.SNR for all figures along bottom three rows is fixed at −20dB. and 5GHz (Fig.11b). These results indicate that the use of sparsity constraints in the hidden layer representations result in increased robustness of the algorithm to Gaussian noise. Additional stacking layers also benefit for slightly higher SNR values. However, at very low SNR values (< −10dB), in Fig.11c corresponding to 10GHz radar frequency, additional stacking layers may be required to retain the robustness of the algorithm. The performance here is poorer due to weaker radar signals at 10GHz due to the gain offset.

B. High Resolution Range Profiles
Next, we study the performance of the three algorithms on the HRRPs. First, we consider the label mismatch error in Fig.12 for the three carrier frequencies when SNR is −10dB. Again, we observe that all three algorithms result in increase in SSIM after denoising compared to the SSIM of the cluttered image before denoising. As the labelling mismatch error increases, the SSIM falls for all three algorithms across the three frequencies. However, the SSIM fall of the StackedSDAE is far less than DAE and SparseDAE and is at 0.75 even when the label mismatch is 50% for 2.4GHz (Fig.12a) and 5GHz (Fig.12b). The denoising performances for all three algorithms are however significantly poorer at 10GHz possibly because of the greater multipath and weaker signal strength. Similar trends are observed when we compare the performance of the three algorithms for varying SNR in Fig.13. Both SparseDAE and StackedDAE are more robust than DAE at low SNR values for 2.4GHz (Fig.13a) and 5GHz (Fig.13b). But StackedSDAE deteriorates significantly at extremely low SNR for 10GHz (Fig.13c) below −10dB, due to the weaker signal strength, indicating that we may need greater depth in the hidden layers of the autoencoder.

A. Measurement Data Collection
In the previous sections, we demonstrated the effectiveness of the SparseDAE and StackedSDAE in denoising spectrograms and HRRPs generated in ACR scenarios. However, these algorithms are essentially suited for denoising any type of radar signature. To support this claim, we evaluate these algorithms on a third type of radar signature -the range enhanced frontal images. These images are generated by processing wideband measurement data from 3.3 to 10.3GHz captured using an imaging sensor called Walabot Pro [48]. This is an uncalibrated sensor that consists of a 4 × 4 antenna array with a maximum detectable range of about 4m in lineof-sight conditions and a field-of-view of approximately 90 • across azimuth and elevation. The radar data cube is processed through three-dimensional Fourier transform. Then the peaks across the range domain are superposed to obtain rangeenhanced frontal images of targets. The experimental setup for our measurement data collection is shown in Fig.14. The DAE and its variants are trained with clean images of a slow moving human subject at 2m gathered in line-of-sight environments and the corresponding cluttered images. The subject carries two boxes covered with aluminum tape to enhance the reflectivity from the hands. Fig. 15(a) shows the frontal radar image of the subject in line-of-sight conditions where the torso, legs and arms of the human are noticeable. The experiments are performed on 5 subjects of different heights and girth. For each of these subjects, we captured 90 measurements at different orientations with respect to the sensor.
Due to the low transmitted power and limited dynamic range, the Walabot cannot be used in typical Indian throughwall scenarios (20cm brick walls). Instead, we synthetically  corrupted the radar images with three types of distortionsadditive Gaussian noise (of SNR from 0 to 30dB), clutter, and labelling errors. We modeled the clutter as a collection of discrete point scatterers randomly distributed across the radar's field-of-view whose magnitudes were varied to obtain signal to clutter ratios (SCR) spanning from 0 to 30dB. The phase of each scatterer followed a uniform distribution across 360 • . Using Binomial distribution and a probability of false alarms of 0.06, we obtain approximately 5 false alarms for each image. We generated the cluttered images by the complex sum of the measurement, and clutter signals. An example of the distortions introduced to frontal image by clutter signals is shown in Fig. 15(b) where we observe ghost targets.
We introduced labelling mismatch errors by shuffling a proportion of the labels of the clean images so that they did not correspond correctly to the corrupted images. Each of the 31 × 31 pixel images were vectorized. Then the images are stacked column-wise to obtain a [961 × 450] matrix. The data is then split into a training set (80%) and test set (20%).

B. Results and Analyses
The training and testing of the algorithms is carried out in Matlab 2018b on Intel(R) Core(TM) i7-5500 processor with 16-GB Ram running at 2.40 GHz. We present the results of the conventional DAE and its two variants -SparseDAE and StackedSDAE in Fig.16. In all the results, we present the SSIM between the corrupted and ground truth reference clean image before denoising (BD) and the SSIM between the reconstructed image and clean image after denoising (AD). We observe a significant improvement in the SSIM after denoising for all three algorithms in all of the cases.
First, we study the effect of SNR on the denoising performance in Fig.16(a). We observe that as the SNR decreases, the SSIM degrades significantly for conventional DAE. However, both SparseDAE and StackedSDAE are robust to noise since the performance does not significantly deteriorate with fall in SNR. The DAE, is however, less sensitive to low SCR values as observed in Fig.16b. The SparseDAE deteriorates slightly  Fig. 16c. However, the SparseDAE and StackedSDAE are less sensitive than the conventional DAE. Finally, we examine the sensitivity of the algorithms' performance to the number of nodes in the hidden layer in Fig.16d. We observe that the performance of all three algorithms converge with an increase in the number of nodes. The StackedSDAE, however, converges for the fewest number of nodes. The algorithms require a minimum number of nodes in the hidden layers to achieve maximum improvement. The number of nodes is an important metric that determines the computational complexity (both time and memory) of the algorithm.
We compare the computational time complexity of the three algorithms during training and test in Table.I. While the training time of DAE and SparseDAE are comparable, the StackedSDAE takes more than twice as long to train. This is because the incorporation of stacked hidden layers results in training requirements of additional weighting matrices in the hidden layers. However, due to the feature size reduction in these matrices, there is considerably lower computation time for the StackedSDAE during test. As a result, the StackedS-DAE maybe more suitable for real time operations.

VI. BENCHMARKING WITH OTHER ALGORITHMS
We compare the performance of our algorithm, qualitatively and quantitatively, with subspace filtering based on singular value decomposition (SVD) and wavelet filtering. Subspace Filtering: We have used subspace filtering methods presented in [57] to denoise the simulated time-domain data used to generate the ACR spectrograms. Using SVD, we identify the eigen values of the data. The top few singular vectors belong to the signal while the remaining constitute the noise subspace. Then, we reconstruct denoised spectrograms after removing the distortions arising from the lower eigen values. Since the multipath clutter, in our scenario, are directly dependent on the target, they do not occupy orthogonal sub-spaces to the target subspace. The results presented in Fig.17 clearly indicate that the SVD based approach removes the noise but not the multipath distortions introduced in the ACR scenario, which are observed at negative Dopplers.
Wavelet Filtering: Next, we use wavelet based techniques for denoising the images. We apply discrete wavelet transform on the raw time-domain data. Again, we assume that the signal returns occupy the top wavelet coefficients while the remaining coefficients correspond to the noise and distortions. Therefore, we convert these coefficients to zero and then apply inverse discrete wavelet transform to reconstruct the denoised images. The resulting images in Fig.17 again show that the algorithm is successful in removing the independent noise but  In Table.II, we quantitatively compare the performances of the three algorithms. We observe that before denoising, the average SSIM between the noisy ACR signatures and the free space signatures are 0.05 and the normalized mean square error (NMSE) is 0.22. After denoising, the SSIM for SVD and wavelet filtering improve slightly by removing noise. However, the images still look different from the free space images due to the presence of the clutter. On the other hand, the proposed methods using autoencoders result in high SSIM (above 0.9) and a low NMSE (0.01) since they succeed in removing both noise and clutter based distortions.

VII. CONCLUSION
Indoor radar signatures of dynamic human motions are corrupted by target dependent and independent static and dynamic clutter introduced by the presence of walls and other reflecting surfaces. We have used a variant of DAE, called the StackedSDAE, that incorporates both sparsity and depth in the hidden layer representations of the noisy images for clutter mitigation. The encoder and decoder stages of the algorithm are trained with labelled clean and noisy data. No additional information of the wall geometry or characteristics are required for the algorithm. Due to the additional stacked layers within the hidden layers, the training time for this algorithm is greater than that of the conventional DAE. However, each additional stacked layer has fewer nodes than the previous layer. This results in lowered feature size of the final representation and lower test time operation. The resulting denoised images are structurally similar to the radar images of the target that would have been obtained in free space scenarios. The StackedSDAE is more robust than the conventional DAE to labelling mismatch error between clean and noisy images during training. The algorithm is also more robust to low SNR (−10 to +5dB) than the conventional DAE. For example, in the case of the simulated ACR signatures, the denoised images have an SSIM above 0.75 even when the SNR is −10dB and the label mismatch error is 50%. At extremely low SNR (below −10dB), we observe some deterioration in the performance possibly indicating that greater number of hidden layers are required at these SNRs. The complete database of radar signatures and algorithms are shared with the research community on the following URL https://rb.gy/mmhzf6.