An efﬁcient way to use MS-CLEAN associated with Shannon’s entropy

This article covers deconvolution methods in the context of radio astronomical images. A new formulation is proposed to deal with negative brightness, deconvoluting separately the positive and negative brightness of the sky. The positive brightness is physically possible, but negative brightness is a degradation product. At the same time, the paper presents Shannon’s entropy’s behaviour in the context of the Multi-Scale CLEAN (MS-CLEAN) algorithm, deﬁning the measured brightness as information in the scope of Shannon’s entropy. The knowledge acquired is used in an example of information monitoring at scales, which automatically reduces the search space of MS-CLEAN, and reduces the computational cost. The proposed algorithm, called Relevant Component Multi-Scale CLEAN (RC-CLEAN), can be up to 4 times faster than the classic MS-CLEAN without prejudice to the identiﬁcation of structures and noise reduction. Here, Structural Similarity Index ( SSIM ) and Peak Signal to Noise Ratio ( PSNR ) used to quantify the results, respectively, showed the same quality for the SSIM and gains of up to 11 dB for the PSNR . RC-CLEAN also shows a result similar to that obtained by the standard software of large astronomical laboratories using real data.

The CLEAN algorithms are the standard solution used in radio astronomy for almost 50 years [4]. No one method has the same popularity and success for radio astronomy [4,5], and due to its robustness and efficiency, it has applications in other areas [6,7].
It is possible to split the CLEAN algorithms into two broad categories, a scale-free CLEAN [8,9] and multi-scale CLEAN [10][11][12][13]. The scale-free CLEAN algorithms deconvolve sky objects as a set of points. This approach has low efficiency and limitations to develop sky objects with extensive and complex emissions, and this limitation debut with the advance of radio telescopes even in the 1980s.
To supply this deficiency of the scale-less CLEAN algorithms, the astronomers apply the Maximum Entropy Method (MEM) for radio sources [14,15]. It is the first competitive method with convex optimization for radio astronomy. The traditional MEM struggles to solve point sources and has a significant dependence on the input model [3]. In fact, it is recommended to use the CLEAN output as the MEM input model [1]. With the emergence and rapid consolidation of multi-scale CLEAN methods in the 2000s, the adoption of the MEM method has significantly decreased [3]. However, it is still possible to find essential articles that conjugate the CLEAN algorithm with the MEM approach [2].
The multi-scale CLEAN has two variants, one uses a set of scales [11,12], and the other chooses automatic scale adjustment [10,13]. However, the automatic scale adjustment methods are still little used by the radio astronomical community.
Until now, CLEAN methods, even those multi-scale, remain very dependent on human choices, such as a set of masks, different weighting schemes, the choice of the scale dictionary, and mainly the selection of the values of multiples stopping criteria [16].
Concomitantly, in recent years, new convex optimization methods emerged, such as sparse regularization methods [5,17,18], Bayesian methods [16,19], and smooth regularization method [2]. Its use is limited currently to test scenarios, and particular applications, as these methods present instabilities for some applications in real data [3,12]. Besides, even very modern applications have a high computational cost, being between 60 to 180 times slower than CLEAN-based algorithms [16].
Our article is positioned as an improvement to CLEAN algorithms, presenting Shannon's Entropy as a monitoring criterion for scales. Here, the aim is to obtain competitive results using a much smaller set of adjustment parameters that depends on human intervention. In order to mitigate the computational costs of Shannon's entropy, we propose a strategy for multiscale CLEAN algorithms called Relevant Component Multi-Scale CLEAN (RC-CLEAN).
There are two ways to improve the performance of CLEAN algorithms: (a) process only relevant information and (b) do not search in places where it is known that there are no relevant signal. Both tasks require decision-making, whether it is to prevent unwanted information or reduce search space. The most elementary way to reduce the search space in CLEAN is by selecting areas of interest. In the CLEAN context, it is common to use masking, usually called CLEAN BOX. In this work, besides using CLEAN BOX, we automatically reduce the search space by using Shannon entropy and treating differently the regions of positive and negative brightness in the deconvolution process.
The authors in [20] use a toy model to suggest that entropy is an attractive stopping criterion for single scale CLEAN. We consider that this approach is very preliminary and that tests of entropy behaviour in more complex algorithms and more complex simulations are interesting contributions.
The multi-scale CLEAN is a Matching Pursuit algorithm combined with a simple space-scale approach. The most common stopping criteria for these deconvolution algorithms are the number of iterations and an intensity threshold, where both are used to prevent noise deconvolution, and consequently, the residue by scale of this process exhibits noise characteristics.
The shape of the entropy of the residue component of deconvolution is similar to Figure 1. The entropy of the residue grows The expected shape of the entropy of residues in the CLEAN algorithms faster with higher brightness output. We will call this region a foreground sector. If we do not stop the algorithm, it enters the noise-dominated region where the entropy varies slowly. We will call this region a background sector. In this sense, we propose that the entropy of the residual image may be an attractive way to stop processing some scales and reduce the search space.
The signal to be reconstructed in radio interferometry is the luminous brightness of the sky. Mathematically, it is the flux of light of the sources of light in the sky, and it is a positive quantity. There is no negative brightness, it is a spurious product of the degradation process, and negative values measured are problems in some contexts. In Classic Hogbom's CLEAN [8], there is no mention of the distinction between positive and negative brightness values, on the other hand, Schwarz [9] proves that Classic CLEAN converges since the PSF is positivesemidefinite. In [11], the negative values are separated and discarded from the positive values after the deconvolution. When we detect a negative brightness, the other alternative is stopping CLEAN, and it is well established by the astronomical community, such as MIRIAD 1 and CASA 2 . If the deconvolution algorithm stops when a negative brightness is detected, relevant information may be lost. For this situation, in this article, the entropy analysis of the residual image is used. On the other hand, the disposal of collected points without proper analysis seems to be an inefficient procedure. In the context where it is not a desired negative brightness, we see a third possibility: ignoring the negative brightness in the first moment of the deconvolution. This work analyzes this possibility, showing their consequences.The entropy monitoring procedure helps with this analysis.
Given the scenario presented, we formulated the following conjecture: by avoiding the search for negative brightness values at the first moment, a positive impact on the method's performance is promoted, where entropy can be used as a measure to reduce the search space, avoiding noise deconvolution automatically, improving the quality of the deconvolution process associated with a shorter execution time.
Therefore, this article has two objectives: (i) investigate the Shannon entropy's behaviour in the scales of the Multi-Scale CLEAN (MS-CLEAN) algorithm [11], and use entropy to automate the reduction of the MS-CLEAN search space, seeking to reduce the computational cost; (ii) investigate the consequences of the deconvolution of positive and negative brightness separately in astronomical images.
The object of study in this article is radio-astronomical images because this area has a high demand for process automation due to the large volume of data. However, it is important to note that other areas of human knowledge also make use of the CLEAN deconvolution methods [6].
This article evaluates the above-mentioned conjecture using two astronomical data from simulations and one real data. The number of experiments is compatible with other articles about interferometry using the CLEAN methods [11,13,21,22]. The traditional MS-CLEAN [11] algorithm is compared with the algorithm proposed here, RC-CLEAN. Also, we made comparisons, using real data, between RC-CLEAN and other methods available to corroborate our conclusions.
The main contributions of this work are: (i) showing the Shannon entropy behaviour in residuals of the MS-CLEAN scales, with an application to reduce the search space; and (ii) showing the result of separately deconvolving positive and negative brightness.
The organization of this work is as follows: Section 2 shows a review of synthesis imaging for astronomical sources and deconvolution using the CLEAN methods. Section 3 defines the entropy of residue in the scale, proposes innovations and optimizations in the algorithm. Section 4 shows the experiments, results, and discussions. Finally, Section 5 shows the conclusions of this study.

The image formation
The radio astronomers use a set of antennas to measure the Fourier modes of sky objects. Each antenna in interferometric array records the incoming complex electric field as a function of time, frequency, and polarization. The i th pair of antenna gives interferometric visibility V i . The van Cittert-Zernike theorem states that the spatial coherence function of the electric field is the connection between the sky brightness function and frequency domain [1]. Therefore, every pair of antennas tracks a specific sky object, and gives a single Fourier component of the observed object.
For a complete sampling of the Fourier space, the Fourier inversion defines the image of the sky object (I ), where  -1 is inverse Fourier transform and V is a set of visibility. In the real case, sampling operation and noise contamination affect the signal V . For a sky object located at (x, y), the pair (u, v) is the location of measure in the sky's frame. Ignoring for the moment the effects of noise, the sampled visibility S (u, v)V (u, v) defines a dirty image (ID), as follows, where B is PSF , * is the convolution operator and S is the sample function defined as, where (u k , v k ) are the measurement coordinates generated by the set of antennas in sky's frame, N is the total number of observations, and w k is a weight value applied to the measurement. In this case, there are three weight types, natural, uniform, and robust [1]. Finally, adding the noise in (2), the inverse problem becomes, where is white Gaussian noise [1]. The astronomical laboratories provide information about experiment design, atmospheric conditions, procedures, and software with which it is possible to know ID and B from the measurements of V . It is important to note that I is estimated by observing ID and using B and , which characterizes an inverse problem.

2.2
The image restoration using CLEAN paradigm All CLEAN algorithms collect points of maximum in search space, and such points are called the CLEAN components (CC ). Each iteration of the algorithm updates the search space by removing the CC convolved by PSF , and search a new CC . The set of CC forms the model image IM .
The restored image (IR) also follows a standard formulation, where CB is a smoother kernel, called clean beam, IM is the main output of any CLEAN algorithm, and R is a residual image, the rest of the deconvolution process, and is the elementary search space for a CLEAN methods.
The restored image has a Jy∕beam unit, where the beam is CB. The unit Jy∕beam is the standard unit of spectral flux density in radio astronomy. The Jy is the symbol for the Jansky, and it is equivalent to 10 −26 watts per square meter per hertz. To convert this Jy∕beam to Jy∕pixel , we divide the image by the beam area.
Different implementations of CLEAN effectively change the structure of IM , and R. For the classic Hogbom's CLEAN [8], the CC are the points, and the search space R is updated at each iteration. However, for the Multi-Scale CLEAN(MS-CLEAN) [11], the CC are the smoother points in different scales, and the search space is the R represented in different space-scale.
The Hogbom's CLEAN can be described in terms of a few steps using the inputs images B, ID, and four control parameters: the loop gain (g), the component threshold (̂), the number of iterations (n), and the clean beam (CB).
Generally, the researchers choose the n and g values empirically [1]. It is important to mention that the g value impacts on the convergence speed of the algorithm and separation of CC , from the same source, into IM .
The traditional method to choose CB adjusts an analytic Gaussian Elliptic function using the PSF points between the maximum value of PSF and a half of the maximum value of PSF [1].
Thêis a stop criterion, and to estimate its value, a patch is chosen from ID without brightness sources. In such a case, the RMS value ( ) on the patch is calculated and̂is chosen in the range of ≤̂≤ 5 .
In [9] it was shown the Hogbom's CLEAN converges if the PSF is positive-semidefinite, but this proof ignores the noise. Even when convergence is not guaranteed, CLEAN can yield consistent results by choosing a suitable set of control parameters.
The Cornwell's Multi-Scale CLEAN (MS-CLEAN) [11] keeps intact the essence of Hogbom's CLEAN but search for source brightness on various scales using the space-scale approach [23]. In the space-scale method, each signal can be represented in new space by, where f is original signal, K is a smoothing kernel for scale and F is the resulted signal in the space-scale .
In [11], there is no formal demonstration of the requirements for multi-scale algorithm convergence. The authors argue that demonstration can be obtained from [9], then the PSF should be positive-semidefinite.
In [11], the authors show using experiments that the MS-CLEAN improves convergence, runtime, and stability when compared to any single scale CLEAN, even whether PSF is not positive semi-defined. The MS-CLEAN becomes the reference algorithm when the solution to the deconvolution uses inverse modeling.

OUR PROPOSAL
In this section, we present the details of our proposal. This proposal has two foundations: (a) the alternative treatment of negative brightness, and (b) the entropy of residuals by scale to decide when stop a less informative scale.

The treatment of negative brightness
Negative components can appear for two reasons: insufficient sampling, which generates PSF with negative values, and the addition of instrumental or thermal noise. Only the first case can make negative components with significant intensity.
Negative components are undesirable in many contexts, and high-level mechanisms are common to interrupt the updating of the search space for established CLEAN software. Other implementations of CLEAN eliminate negative components after the search process [11].
Our proposal does not process the negative components in the first moment, being an intermediate solution between stopping the algorithm when it collects a negative component and eliminating the negative components after the search process.
The negative components are later deconvolved using the same algorithm as the positive components.

Information theory and multi-scale CLEAN
For the MS-CLEAN algorithm, the deconvolution residue in scale is R i , and before the algorithm starts, this image is the dirty image ID written in the space-scale . For a set of scales q with q elements, the search space is the set of R i with q residuals images. For each round i, the information collected is the maximum point in the current search space. This information is the current CLEAN components CC i , defined as, where, g is loop gain,̃is the scale at which the maximum occurs The set R i are updated as follows, where B̃is PSF in space-scale of current maximum and K is a smoothing kernel for scale. Therefore, each R i looses luminosity as deconvolution progresses. We define that the luminosity is the representation of the information in this problem. The loss of information is measured using Shannon's entropy [24], as follows, where p i is probability mass function of the R i . The entropy calculation uses the 8-bits quantization for the R i . For each iteration of the algorithm, the calculation of (9) implies a high computational cost. Therefore, we propose a calculation of (9) in batch, a trade-off between information estimation of R and computational cost.

The behaviour of entropy of residue in the scales
This work does an exploratory survey in two groups. In the first group, the entropy of residues in the MS-CLEAN scales is The shape of the entropy using 30Dor residuals for the six scales of the MS-CLEAN computed. In the second group, the entropy of residues is computed when we avoid the search for negative brightness.
In the first group of experiments, we use the traditional MS-CLEAN [11] in six configurations. We use simulated interferometric data from two astronomical objects, the HII region in the M31 galaxy and 30 Doradus in the galaxy Large Magellanic Cloud, respectively, called M31 and 30Dor. We vary loop gain g and component threshold̂. More details about the simulation and other parameters are described in Section 4.2.
We test̂sequentially. We choose a sufficiently low value of , and higher values will be covered in the experiment. For each simulation, we study the value of̂in multiples from 1 to 5 of the reference value defined in Section 2.2. The algorithm stops when there is no component in the search space intenser than̂, that is, when all scales reach the component threshold.
As discussed in Section 2.2, the g is related to the separation between components of the model image IM . The parameter g controls the number of iterations to reach â. For each simulation, the value of g varies around the default value suggested by astronomical laboratories, that is, g = 0.1 [1]. Figures 2 and 3 show that entropy curves of residue for scales in the MS-CLEAN have two sectors: (a) the foreground sector with a fast and positive variation of entropy, and (b) the background sector where entropy maintains almost constant. The scales used are those suggested in [11].
Also, in Figures 2 and 3, the vertical lines (blue and green) show the relationship between entropy and maximum intensity of residue by scale. This maximum intensity is the measure associated with the stopping criterion̂. For example, when we use  = 2 to stop the MS-CLEAN, the narrowest scale in M31 increases the number of iterations in the background sector. In Section 4, it is possible to see that, for this example, it implies noise deconvolution.
In Figure 3, for̂= 4 in 30Dor case, the entropy values do not reach the flat region. In Section 4, it is possible to see that this situation implies in lower quality of deconvolution. Figure 4 shows the impact of g value on the behaviour of entropy of residue for scales in MS-CLEAN. First, there is no change in the qualitative behaviour of graphics for different values of g. However, g dictates how fast the flat sector is reached. Table 1 shows that this behaviour is independent of the final deconvolution quality.
The first vertical line from left to right in Figures 2 and 3 shows that using the first negative component to stop the algorithm or scales generates evasion of relevant information to the model image IM . Figures 5, 6, and 7 show the effects of avoiding the search for negative components over entropy of residue in the scales. The preservation of negative components in the residues adds a FIGURE 5 The shape evolution, while g values vary, for the entropy using M31 residuals for two scales of the MS-CLEAN when we avoid searching for the negative components

FIGURE 6
The shape of the entropy using M31 residuals for six scales in the MS-CLEAN when we avoid of searching for the negative components highlighted maximum to the graphic shape of some scales, and the flat sector change to a downward sector. Figures 6 and 7 show that the entropy curves of residue have three sectors in some scales. The first sector is a foreground sector, like the MS-CLEAN graphic; the second sector is a background or downward sector. For narrow scales, the respective curves are almost flat, and the third sector is a transition sector, which is more pronounced on narrow scales. The impact of the value of g is shown in Figure 5, for higher g values, transitions are faster, but in these cases, the downward sector is almost flat.

FIGURE 7
The shape of the entropy using 30Dor residuals for the six scales in the MS-CLEAN when we avoid searching for the negative components

Separation between foreground and background components
We divide the graphic of entropy for scale residue in sectors. The first sector where entropy is growing fast, presents relevant information. This sector is present in the MS-CLEAN and the proposal of this article. The CLEAN components (CC ) in the first sector will be called foreground components. The second sector is a flat region for the MS-CLEAN, and it is a downward region when we reject the negative components before deconvolution. The second sector has the components of background, and noise is dominance in the background. Only when we avoid the search for negative components, there is, on some scales, a transition sector between the previous ones.
Therefore, we propose to divide the deconvolution into two stages. The first one looks for the positive components, and build the model image (IM ). This stage is called the foreground deconvolution. The second stage is called the background deconvolution. In this paper, we use the same framework to deconvolve foreground and background. Any negative component goes to the background deconvolution.
This separation creates flexibility in parameter choices, which may differ between control parameters for the foreground and background deconvolution. It also enables the use of totally different architectures for each stage.

Stopping irrelevant scales using Shannon entropy information
Avoiding a search on a scale without relevant information is a way to reduce the search space. A scale processing can stop when it meets a breakdown criterion, but it is complicated because the scales are not independent. The entropy of the residue exhibits behavioural regularity and this helps in decision making.
In this subsection, we discuss strategies to remove automatically a scale that has become irrelevant.
Based on the entropy of residue in each scale, we propose: 1. For scales with two sectors, unwanted information is in the almost flat region, what could be observed from Figures 2 to 6. 2. For scales with three sectors, the separation between desired and unwanted information occurs in the transition sector, as can be seen in Figures 6 and 7.
For scales with two sectors, the most prominent strategy is the detection of the almost flat region. The most elementary operator in this context is the standard deviation or slope. Identifying this type of behaviour is relatively costly because more data observations would be needed to detect the almost flat region. On the other hand, for scales with a transition sector on entropy, it is possible to use this region behaviour to remove the low informative scales, consequently reducing the search space.
For scales with a transition sector, we use the difference between the maximum and current of entropy of residue in the same scale, receptivity (H ) and (H i ), to design a fast and efficient algorithm to reduce the space search. The maximum difference occurs betweenH and H 0 , because H 0 has the lowest entropy once it has all the information available on scale .
Therefore, we define the actual declination at this maximum difference, The algorithm starts without knowing what is the maximum point. At each round, the current value is compared to the current maximum value, before identifying the maximum de returns to zero. When we identify the maximum value, the value of de begins to grow. Our experiments have shown that a declination of 1% can remove a scale during a fluctuation in entropy value. We observed that a good choice is when 2% ≤ de ≤ 5%.
To mitigate the effects of entropy value fluctuations, we recommend using the average of the entropy under some iterations. We discuss details of this implementation in Section 3.6.
The automatic decision-making should take into account more than stop criteria way, and we recommend using the component threshold̂and the maximum number of iterations n associated with strategies involving entropy.

Automatic monitoring of entropy implementation in a multi-scale framework
This subsection presents the algorithm to make an efficient way to search space reduction using entropy monitoring of residual in the scales of the MS-CLEAN.
The Algorithm 1 refers to the current work. The global stopping criteria is used (1) when the subset of active scales s is empty or, (2) when there are no components smaller than the component threshold̂or, (3) when the maximum number of iterations n is reached.
The algorithms of [8,11] and [25], are easily adapted using Algorithm 1. The most significant change in code is the inclusion of entropy, but instant monitoring of entropy presents two problems: the increase in runtime and fluctuations of the entropy value.
For an image with 256 × 256 pixels, the runtime of one round of 8 bits entropy is close to 70 ms in one core of Intel(R) Core(TM) i5 CPU @ 3.30GHz. On the other hand, using the same processing power and the same image, runtime of each entropy-free iterations for the MS-CLEAN with six scales is close to 25 ms. Even if the entropy in scales were parallelized, the runtime of one iteration would go from 25 ms to almost 100 ms, respectively, without and with entropy monitoring.
We use two procedures:   In Section 2.2, we explain how to define the component threshold (̂). We use the same heuristic to set a trigger to start monitoring. Before, thêis chosen in the range ≤̂≤ 5 , where is the RMS value under a patch without sources in the dirty image. The trigger value must be greater than the upper limit of this range, and for this work, a trigger of 6 was enough.
We suggest calculating the average value of R i on Q iterations to mitigate the fluctuating effects of entropy. The entropy monitoring can be done every Q iterations over the mean value ⟨R i ⟩. This procedure has a positive impact on runtime. For this article, we choose Q = 50 iterations as a compromise between runtime and monitoring frequency.
We use the same choices and architecture for background deconvolution but looking for the negative components in the loop, without entropy monitoring or image model updating. After performing background deconvolution, the model and residual image make up the restored image using Equation (5).

DECONVOLUTION EXPERIMENTS AND DISCUSSIONS
We make experiments use simulated data and real data. The simulated data are the sets used to investigate the entropy curves of Section 3.3. We explain the real data in Section 4.5.

Overview of experiments with simulated data
We divide these experiments into two sets. The first set uses the default MS-CLEAN with only the threshold component (̂) to stop MS-CLEAN. The gain loop (g) is fixed, and the model image (IM ) does not use negative components. We test several configurations in the MS-CLEAN to investigate how the framework reacts to the parameters. The second set uses our hypotheses, performing the deconvolution in two stages. The first stage, called foreground deconvolution, is made only with positive components and turning off scales using entropy. The second stage, called background deconvolution, uses the same algorithm of foreground deconvolution, but now for negative components and without monitoring the entropy.

Simulated data
This paper follows the test guidelines presented on [11] and [21]. We make simulations with two different sky objects. In our experiments, we using M31, an object with medium complexity, based on an HII region in the M31 galaxy. We use a more complex object, the 30 Doradus in the Large Magellanic, called 30Dor. Both are Narrow Band object. The astronomical laboratories provide input images M31 and 30Dor, and we do zero paddings to make the size 512 × 512 pixels. We make the simulation using CASA [26] simobserve module. The set of antennas is the Expanded Very Large Array Project (EVLA) with B configuration. The observation is made in L band with . The scale bias factor is b = 0.6 (default). All deconvolutions use this design, and the CLEAN BOX was centered with half the size of the dirty image (default). Figure 8A shows the overall view of PSF, and it has many scattered negative values (blue in frame). Figure 8B shows the core detail, with an almost symmetrical main lobe. Figure 9A shows the dirty image of M31, and this is an object of medium difficulty, with concentrated structure and many areas without bright sources. Figure 9B shows the dirty image of 30Dor, and this is a complex object with many fringes and diffuse brightness. Both figures are crop with 256 × 256 pixels, which corresponds to the clean box region, and this clean box defines the initial search space without the space scale transform (6). The beam, in this case, is the area of PSF where the spectral flux density is measured.

Performance metrics
The quality of the results is measured objectively with two full reference metrics (FR). These measurements are made on restored images IR, the final product of the deconvolution process. The first FR is the Peak Signal to Noise Ratio (PSNR), defined as PSNR = 20 log 10 where I sky is the sky model of simulation in the same unit of IR, | … | ∞ is the l ∞ -norm, and rmse is the Root Mean Square Error. The second FR is the Structural Similarity Index (SSIM ), defined as, where S and S are, respectivel,y the mean and standard deviation of the I sky , R and R are the mean and standard deviation of IR, SR is the co-variance between I sky and IR, and the C 1 , C 2 are the smallest positive constants.
The PSNR and SSIM are complementary metrics in our context. The PSNR has higher sensitivity for measuring noise presence than SSIM [27]. On the other hand, the SSIM is better for measuring structure detection than PSNR. The algorithms are written in python 3 and running on an Intel(R) Core(TM) i5 CPU @ 3.30GHz.

4.4
Performance on data Table 1 shows two sectors, one for M31 and the other for 30Dor and compares the quality and runtime among different MS-CLEAN configurations. The M31 sector of Table 1 shows that lower g values yield higher results in FR metrics and higher runtime. The 30Dor sector of Table 1 shows that lower g values provide a slight improvement in PSNR but severe deficiencies in SSIM . Experiment M31 presents several competitive configurations in Table 1, but visually M31 has concentrated brightness, with large areas without relevant luminosity. Table 1 shows that for loŵand high g values, it leads to bogus components of the image model IM and low PSNR values. Because of the characteristics presented by M31, we believe the set of parameters with a good compromise between both FR metrics and computational time is the configuration in row 3. We use the configuration in row 3 of Table 1 under our paradigm with the avoid negative values and stop scale processing and to compose Figure 11.
On the other hand, the 30Dor has diffuse brightness, and structures scatter. In this case, we choose to prioritize structure detection, and the most suitable configuration is depicted in row 8. We use this configuration to compose Figure 12 and test our paradigm to avoid searching negative values and stop scale processing.
Row 1 of Table 2 shows the results of testing the removed scale using de = 2.5%,̂= 4 and g = 0.2. The PSNR is raised by 11 dB, while SSIM is virtually constant. The runtime, in this In Table 2, the quality and runtime for Relevant Component Multi-Scale CLEAN (RC-CLEAN), our proposal, are compared. Row 2 shows a marginal change in quality when compared to the row 1 configuration. We choose row 2 for comparison with the MS-CLEAN configuration in row 3 of Table 1. In Figure 11, the results of this comparison are observed. Direct comparison among different configurations shows that RC-CLEAN method presents a PSNR of 2.9 dB, better than the best result of MS-CLEAN, spending only 44% of its runtime.
After the deconvolution performed with the characteristics described in row 2 of Table 2, only one scale is still active. For row 3, when usinĝ= 2 , it is necessary to change de to 5.0%. The results in row 3 of Table 2, if compared with row 4 of Table 1 show the PSNR is raised by 10.9 dB, while SSIM grows in 0.06 units, and the runtime decreased from 1260 s to 308 s. Figure 11A shows that the MS-CLEAN overestimates the most prominent brightness when compared to the ground truth in Figure 11B. On the other hand, Figure 11C shows that our proposal underestimates the image brightness. The region without bright sources, Figures 11A and 11C are similar.
The 30Dor experiment using the MS-CLEAN shows that an adequate configuration to detect structures is the configuration with g = 0.20 and̂= 2 . The results of this choice are in row 8 of Table 1. We use this configuration to test our conjecture. The results on row 4 of Table 2 show that when using a de = 2.5% to remove irrelevant scales, a PSNR raise of 5.6 dB is promoted, while SSIM grows in 0.043 units, and the runtime decreased from 380 s to 234 s. Figures 12A and 12B show the MS-CLEAN overestimates the high positive values. Comparing Figures 12B and C show that our proposal underestimates the brighter region.

Real data
In 2013, The Atacama Large Millimeter Array (ALMA) telescope, measuring 372 GHz, obtain images of a star inside the Hydra constellation [28]. That star is called TW Hydra. The CASA team prepared this data to use in CASA software 3 , and it is a set that was previously flagged and calibrated. We use the same set for all approaches. We test the CLEAN-based algorithms: classic MS-CLEAN [11], modern WS-CLEAN [12], RC-CLEAN, and convex optimization algorithms: classic MEM [1,15] and MORESSANE [4], respectively, entropy regularization and sparse regularization. The experiment consists of deconvolving TW Hydra set using RC-CLEAN and the others. Figure 10A and 10B show the PSF and dirty image of the TW Hydra, respectively. This data set is collected with circular symmetry defined by a frontier, and beyond of frontier, there is no measurement.
All CLEAN-based deconvolutions use this design: (i) CLEAN parameters, the loop gain g = 0.1 (default), the component threshold̂= 15m Jy/beam, the number of iterations n = 5000; (ii) additional parameters, weight measurement (w), we use robust weighting with robust parameter 0.5 (default). The set of scales is q = [0, 1.5, 3, 7]. The scale bias factor is b = 0.6 (default). The classic MS-CLEAN and RC-CLEAN were tested with two masks, one square mask adjusted to data and a circular mask adjusted to data. The WS-CLEAN use default mask and CASA software uses an adaptive mask [29]. On the other hand, we prioritize the default choices inside the convex optimization algorithms, except in cases where the parameter is similar to those used in CLEAN-based algorithms.
As a measure of quality among different methods, we use SSIM and PSNR metrics where we assumed a model such that generated by the state-of-the-art CASA software 4 [26,29]. For real astronomical data, there is no way to access ground truth, and because of that, we define the CASA model as the ground truth in this experiment. The results are in Table 3.
In addition to the objective metrics results, we use a visual inspection between RC-CLEAN and classic MS-CLEAN to illustrate the improvement achieved. Figures 13A, 13B, and 13C show a set of components collected smoothed with clean beam for the classical MS-CLEAN, RC-CLEAN, and CASA. RC-CLEAN shows the same behaviour for the circular mask and the square mask. The classical MS-CLEAN do not converge for the square mask. As can see in Figure 13A MS-CLEAN converges when using the circular mask. However, some spurious bright points are observed, and it indicates the algorithm collects spurious components, including a lot of components over the frontier. CASA software has good behaviour, and RC-CLEAN shows the result close to CASA.

Discussions
There are contexts in which the negative brightness is unwanted, and we show three forms of treatment for the negative brightness. The first treatment uses the occurrence of negative brightness as a stopping criterion to search for components. The vertical red line in Figures 2 and 3 show that the entropy is still on an upward curve on all scales analyzed, and relevant information remains in the residues for this treatment. Therefore, singly, this situation does not constitute an acceptable solution.
The second treatment excludes the negative components after the search algorithm, what is a default procedure for MS-CLEAN. Figures 11 and 12 show that MS-CLEAN overestimates the brightest region for the set of parameters with the best results in objective metrics. We observe that in the typical behaviour of MS-CLEAN, it captures more noise while capturing structures of the sky objects. Table 1 shows this behaviour. For M31, the best configuration for PSNR has a value of 40.3 dB, while SSIM is the worst result in 0.943 units.  Table 2. To aid in comparison with the ground truth, we divided the restored image(5) by the area of the clean beam  Table 2. To aid in comparison with the ground truth, we divided the restored image(5) by the area of the clean beam The best configuration for SSIM has a value of 0.980 units, while a PSNR equals 34.2 dB. When we analyze the 30Dor, the best configuration for PSNR has a value of 36.4 dB, while SSIM has a value of 0.748 units. The best configuration for SSIM has a value of 0.829 units, while a PSNR equals 32.4 dB.
We present a third treatment that deconvolves the positive brightness first and then the negative brightness. Figures 11  and 12 show that this processing underestimates the brightest region when we use an automatic entropy monitoring. In Table 2, a positive correlation of the quality metrics in each Our proposal also converges for one satisfactory solution, and the fastest configuration performs deconvolution by spending only 1/3 of the runtime of the best configuration. Configuration 1 of Table 2 shows measurements of 40.7 dB, 0.973 units, and 110 s for PSNR, SSIM , and runtime, respectively. Row 3 of Table 2 shows the best configuration with higher values of quality metric with measurements of 41.1 dB, 0.981 units, and 308 s for PSNR, SSIM , and runtime, respectively.
For the real data, by inspection, the classic MS-CLEAN shows a poor result compared to RC-CLEAN. It is important to highlight the 91% increase in the PSNR value presented by RC-CLEAN when compared to MS-CLEAN. As expected, the additional controls implemented in CASA software make a clean image, and the star is the unique significant object. CASA software has a lot of parameters [29], and RC-CLEAN has a similar result with one additional parameter compared to MS-CLEAN.
Another significant point is the simulated data use VLA design telescope, and on the other hand, the real data were collected by ALMA. They are very different telescopes. These point about experiment with real data confirm the entropy is a competitive control parameter, and the separation between negative brightness and positive brightness has advantages.
The numerical comparison with other algorithms present in Table 3 shows the proximity between RC-CLEAN and CASA models. This result is expected, as RC-CLEAN and the classic MS-CLEAN use pre-deconvolution routines similar to CASA. The other software does not have the exact synergy with CASA.
For example, in our experiment, MORESANE uses dirty images and PSF provided by CASA, but its deconvolution method is significantly different. These characteristics separate the MORESANE and CASA models. On the other hand, WS-CLEAN uses a deconvolution very similar to that of CASA/MS-CLEAN/RC-CLEAN, but it is entirely independent software from CASA, with the proper pre-deconvolution procedures and the choice of masks. These characteristics separate the WS-CLEAN and CASA models.
However, the results indicate that RC-CLEAN is as competitive as high-end implementations CASA, WS-CLEAN, and MORESANE.

CONCLUSIONS
This article had two objectives: (i) investigate the Shannon entropy's behaviour in the scales of the Multi-Scale CLEAN (MS-CLEAN) algorithm, and use entropy to automate the reduction of the MS-CLEAN search space, seeking to reduce the computational cost; (ii) investigate the consequences of deconvolution of positive and negative brightness separately in astronomical images. The algorithm that gives concreteness to these objectives is called Relevant Component CLEAN (RC-CLEAN).
There are contexts in which the negative brightness is unwanted, and we show two traditional forms of treatment for the negative brightness and a new treatment. The new proposal is an intermediate solution located between the traditional ones.
MS-CLEAN showed a good separation between the highintensity components and the noise components sector. It is observed that when deconvolving firstly only positive brightness points, it is possible to identify until three sectors, the first with a rapid entropy growth, a second characterized by a transition stage, followed by a third sector that shows a soft decreased entropy value. The sector that shows a low entropy variation indicates a predominance of noise.
The problem involving the use of entropy is the computational cost, and the strategies must be carefully designed. For the three-sector scales, we develop an automatic criterion that finishes the processing of low contribution scales, promoting an improvement in the execution time. This result corroborates what was obtained by [20], and expands to multi-scale CLEAN. Entropy monitoring can incorporate the set of stopping criteria in the context of CLEAN-based algorithms RC-CLEAN is the join of the intermediary solution of negative brightness and entropy monitoring in the MS-CLEAN framework. It does the deconvolution in two stages, separating the positive and negative brightness. The first stage ignores the negative brightness and seeks its processing before reaching the noise, and we call this the foreground deconvolution. The second part of the deconvolution uses the negative brightness, and we call this the background deconvolution. The experiments show that formulation is competitive compared to traditional treatment.
RC-CLEAN is modular, and it could compose the set of treatments available for the approach of negative brightness.