Postcorrelation radio frequency interference excision at low frequencies



[1] We present examples of radio frequency interference from our experience with editing data from the Very Large Array and Very Long Baseline Array at frequencies of 74 and 330 MHz and discuss postcorrelation excision schemes commonly used at low radio frequencies (<1 GHz), including those employed for targeted observations, as well as broader brush automated schemes appropriate for surveys and other large data sets. We elaborate on the strengths and weaknesses of currently employed procedures, with an eye to providing a summary of existing methods for those developing future, more sophisticated postdetection data editing algorithms as well as new low-frequency instruments such as the Long Wavelength Array and Low Frequency Array.

1. Introduction

[2] Radio frequency interference (RFI) is a significant problem for low-frequency radio observations (frequencies < 1 GHz) at every major observatory in current operation. Although both the exact nature of the problem and its causes vary greatly, the common result is that radio astronomers spend a significant amount of time and computer power editing their data before they can use it for science. Although some efforts have been made to minimize and mitigate RFI at an instrumental level, they do not yet seem to be in widespread use.

[3] There are two basic solutions to the problem of removing RFI. One approach is to estimate and subtract the contribution from any RFI which is not strong enough to create nonlinearities in the electronics. This attempts to correct for RFI rather than simply removing affected data samples completely. The other approach is to edit or remove data affected by RFI, and is more commonly used by radio astronomers, who refer to it as “flagging.” A variety of software has been developed with the goal of making RFI excision in postcorrelation data less demanding of an astronomer's time. Used with some caution, these can be very effective, and are usually the best choice for large data sets, such as surveys. In fact they are often designed specifically for a given project, and work admirably in that setting. However, for a small data set with a targeted source, flagging by hand remains a better choice.

[4] In this article we present an overview of RFI seen at 74 MHz and 330 MHz in data from the Very Large Array (VLA), and at 330 MHz in data from the Very Long Baseline Array (VLBA). Both of these bands are considered fairly clean regions of the low radio frequency spectrum at the instrument sites. The RFI seen is both self-generated by the instruments and external in origin. Its severity varies from observation to observation, and sometimes from hour to hour. All of the data presented were taken for scientific use and all of them have been successfully reduced for that purpose; the RFI in these bands is rarely completely disastrous although time and patience is required to remove it properly. Our goal is to show the type of challenges being presented by low-frequency data from existing instruments taken by the average astronomical user, along with the types of solutions currently available to that user. Because we feel that the current techniques are not readily extensible to the larger low-frequency arrays being planned, it is our hope that it will be a useful summary of both the types of RFI challenges commonly seen and the existing software solutions for those designing the new instruments.

2. RFI at 74 MHz

[5] In Figure 1 we present some examples of RFI seen in data from the VLA at 74 MHz. All of these observations are made in a fairly narrow band (73.0–74.6 MHz) which is relatively clean compared to nearby frequencies. Efforts to expand the receivers beyond this band encountered signals which overwhelmed the system and were deemed unuseful (N. E. Kassim et al., The 74 MHz system on the Very Large Array, submitted to Astrophysical Journal Supplement Series, 2004).

Figure 1.

Simultaneous snapshots of four baselines from a VLA 74 MHz data set, taken on the morning of 20 October 2003 in the B configuration and showing a variety of “typical” RFI. In each snapshot, channel number or frequency increases to the right, covering a total of 1.6 MHz with 64 channels, and time increases up along the vertical axis covering roughly 3 hours and 45 min with an integration length of 20 s. The gray scale indicates amplitude, with white corresponding to the highest amplitudes; in general, the brightest RFI is about 10 times the flux of the unaffected data. From left to right we show a moderate-length baseline (BL E8-W24) with very little RFI and no evidence for the self-generated 100 kHz comb; a similar length baseline (E8-N28), on which the self-generated 100 kHz comb is very pronounced; a similar baseline (E16-N24), with some signs of the comb, a period in which the entire 1.6 MHz band was dominated by an interfering signal and several moderately broad signals; a short baseline (E8-N8) showing some signs of the narrowband comb, low-level rumble-type RFI; and a strong, narrowband signal that wanders slightly in frequency and is probably externally generated.

[6] By far the most common interference features are a series of constant narrowband signals which are separated by 100 kHz. This interference “comb” is generated by hundreds of oscillators in the VLA's monitor and control system. When the 74 MHz system was originally implemented all of the oscillators in the array were driven by the same clock, creating millions of Jy of coherent correlation on all baselines. The clocks were adjusted to be independent and incoherent, and some shielding was implemented around those electronics which needed to remain coherent; however it was not possible to completely remove the signal from the current VLA system.

[7] The comb is still seen at a diminished level in every data set taken at this frequency as signals from different oscillators become coherent and correlate. It is known that certain pairs of antennas have oscillators that appear to maintain coherence regardless of where they are located in the array, while others go in and out of coherence with each other over time. In other words, the comb may still be seen on baselines of any length, and its apparent strength does not correlate with antenna separation.

[8] Although the visibilities from the comb are now reduced to thousands of Jy, or roughly ten times the typical single-channel noise level in an average field, this is more than enough to cause problems in both calibration and mapping the data if ignored. The affected frequency channels must be excised on observations of even the brightest sources in order to obtain a reasonably accurate primary calibration (gain, phase, and particularly bandpass shape), as well as to improve image quality.

[9] Along with the comb, a variety of other, largely externally generated RFI signals can be found in small quantities. Usually these are less frequent, and may include a low-level rumble, narrowband signals which wander in frequency over time, broader signals that vary with time and cover many channels, and short periods where the entire band is disrupted by interference. In general, these effects are most common on shorter baselines. As a result, data taken with the VLA in its largest configuration (A configuration; roughly 35 km maximum baseline) are the least affected by them. There is currently little effort made to identify or remove these signals unless they are thought to be generated from new NRAO activity at the VLA site; the VLA 74 MHz system user must simply be prepared to identify and remove them.

3. RFI at 330 MHz

[10] In Figure 2 we present some examples of RFI in data sets taken at 330 MHz at the VLA. In general the RFI at this frequency appears to be externally generated, and not the result of the VLA electronics themselves. It may be both broadband or narrowband. There are a few relatively weak signals known to be always present in spectra at these frequencies; some effort has been made to identify the cleanest observing band near 330 MHz in order to avoid the strongest of the interfering signals, but it is impossible to find a 6 MHz or even 3 MHz band that is completely clean. Although most of the RFI seen is strongly correlated with baseline length such that the shortest baselines are most affected, some signals affect longer baselines as well.

Figure 2.

Simultaneous snapshots of three different baselines from a VLA 330 MHz data set, taken in the BnA configuration on the afternoon of 25 May 2002. The vertical axis covers about 4 hours with an integration length of 30 s. The horizontal axis covers 6 MHz, with 16 frequency channels. The gray scale indicates amplitude, with white corresponding to the highest RFI amplitudes, which are roughly 10 times brighter than the normal data. Note that there is a short burst of narrowband RFI near 2 hours in each spectrum; this was present on every baseline in this data set. From left to right we show a moderately long baseline (E4-N72) with only a hint of narrowband RFI on the high-frequency end of the spectrum; a much shorter baseline (E4-N8) showing two different narrowband RFI signals in the first half of the data; and another short baseline (E4-W8) showing both narrowband RFI signals, some short-lived broadband RFI just after 2 hours, and considerable broadband interference in the last hour.

[11] In Figure 3 we present examples of RFI from 330 MHz VLBA data sets. It has been traditionally assumed that interfering signals at widely separated antennae will not be similar enough to correlate. However, experience has shown otherwise. Narrowband RFI at a given station may be strong enough to raise the system temperature of the instrument, creating amplitude “pseudo fringes” with random phase. The result is that RFI appears on all baselines to that antenna, and the passband may also be destabilized. Additionally, there are some emissions, such as those from orbiting satellites, which are received by two or more widely separated stations creating correlated RFI on individual baselines (for a more thorough discussion of these issues, see Romney [2004]).

Figure 3.

Snapshots of three different baselines from a VLBA 330 MHz data set, taken on 17 February 2001. In each snapshot, the horizontal frequency axis covers a 12 MHz band with 48 channels, and time increases along the vertical axis for about 10 min; the data shown on the three baselines are not simultaneous. The gray scale indicates amplitude, with white corresponding to the highest amplitudes. From left to right we show that the FD-NL baseline is roughly 1650 km in length and has a single narrowband interference signal, along with a time-variable second signal at higher frequency; the BP-KP baseline is about 1900 km in length and shows a single narrowband interference, and the passband shape is uncorrected; and the FD-HN baseline is actually 3100 km in length and shows a strong but relatively short burst of interference in multiple channels, similar to the “comb” seen in 74 MHz VLA data.

4. Solutions

4.1. Planning the Observations

[12] Handling RFI at low radio frequencies starts with planning the observations. For both the VLA and the VLBA, as well as most other major observatories, surveys of the RFI environment, or at least lists of the strongest interferers, are available for planning purposes. Sometimes the default setups for the instrument are not, in fact, optimum for a given experiment. Planning the observations to avoid known major interferers can save considerable work in the long run. We note, however, that it is impossible to avoid all RFI, and for any future instruments which plan to provide broad frequency coverages this problem will only be made worse.

[13] Observations should always be made in a multichannel “pseudo-continuum” observational mode, rather than a traditional single-channel continuum mode. Because so many of the low-frequency interfering signals are narrowband, more channels allow data to be excised with a less data loss. Also, by observing with a spectral resolution comparable to the interfering signal bandwidth, the signals themselves appear more prominently and are easier to identify and remove. In addition, using the pseudo-continuum mode improves the wide-field images necessary at these wavelengths by reducing the effects of bandwidth smearing on sources far from the pointing center. The data can always be reduced to a smaller, more manageable number of channels after calibration and RFI excision if desired.

[14] Even maximizing the resolution, with the current correlator capability of the VLA, the procedures we describe below can remove a significant fraction (typically 10% to 25%) of the data. While this is a considerable improvement over observing in a traditional continuum mode, it is still not ideal. A key design goal of any future correlator (including a planned upgrade to the VLA) will be to provide considerably better frequency resolution, allowing much more precise removal of these undesired signals and minimizing data loss even further.

4.2. Automated Techniques

[15] A variety of automated techniques have been developed for RFI excision. In general they are significantly faster and require less time from the user than flagging by hand. This becomes a relevant concern for reduction of many similar data sets, such as those generated by large surveys like the VLA Low-Frequency Sky Survey (VLSS, available at, a survey of the northern sky using the 74 MHz VLA. It will also be important for the large-N arrays being planned for future low-frequency instruments. While the current automated schemes are effective at removing most of the strong RFI, they tend to miss more subtle interference, and they may also flag good data. As a result, flagging by hand is currently a better choice for high dynamic range imaging.

[16] The most basic level of automated RFI excision is to identify and excise any data with very discrepant (usually large) amplitudes; this is usually referred to as an amplitude clip. Used carefully this can remove a fair fraction of the worst RFI, but even a very light clip (e.g., excising everything with an amplitude greater than 5 times the expected total flux on the shortest baselines) will remove outliers and improve results for subsequent more complicated editing schemes. It will also usually remove intervals of very bad, broadband RFI, although these latter are rare.

[17] The next step is to remove any bad times (often there are known issues with faulty first or last samples in the data) and then to excise any frequency channels known to be routinely bad, or seen to be bad throughout the data set. For most observations, a quick glance through a subset of the baselines will quickly identify bad channels. In the case of the 74 MHz VLA, the internally generated comb always falls in the same frequency channels for the same observational setup. The comb is not actually present on all baselines so flagging it throughout the data set will inevitably remove some useful data, but this is usually acceptable for moderate dynamic range experiments. Finally, if the loss of possible large angular size information is tolerable for the experiment, it may be simplest to remove a few of the shortest baselines in the array completely, as it is often difficult to distinguish real source structure from interference on them.

[18] More complicated excision schemes exist in most major reduction software packages. In general they involve making informed decisions about criteria for excision, so some initial time to become familiar with the data is required. The most commonly implemented scheme is to check for deviations from an “average” data value for a given baseline. Either discrepant time samples in a given channel, or discrepant channels at a given time, or both are flagged. Figure 4 shows an example of the visibility spectrum on a baseline after RFI editing using the Astronomical Image Processing Software (AIPS) task FLGIT, which is an implementation of this type of procedure.

Figure 4.

Snapshots showing a single baseline of 74 MHz VLA data both before and after automated RFI excision. In each snapshot, frequency increases to the right, and time increases up along the vertical axis. The data show the full 1.6 MHz band for several hours and are simply meant to be illustrative. The gray scale indicates amplitude, with white corresponding to the highest amplitudes. The baseline chosen has the strong self-generated 100 kHz comb, as well as some weaker features. In the “after” image, the editing, performed using the AIPS task “FLGIT,” has effectively removed most of the data in the comb channels and has also removed all of the strong RFI while leaving some of the weaker signals. For the data set as a whole, roughly 20% of the data were removed by the algorithm.

[19] This technique works very well on narrowband (single-channel) strong interference. It is less successful on weaker interference and on broader band signals, where the average is usually affected and only the tops of the interfering signal are removed. It also may leave a few low-amplitude points in a region of otherwise properly flagged interference, although a simple test which flags frequency channels completely when the deviation test has already flagged a high percentage of the data can help minimize this problem.

4.3. Excision by Hand

[20] Editing data by hand requires patience but can bring great rewards in terms of image quality. Some time can be saved by removing any bad time samples and excessively high-amplitude data before starting. It is also possible to apply several of the automated techniques and then flag by hand to clean up the results, in a sort of hybrid technique.

[21] One of the most powerful interactive approaches is to use a spectral display (i.e., time on one axis, frequency on the other) to identify and edit the discrepant data on each baseline immediately after primary calibration has been completed. Generally this level of excision will be sufficient to identify and remove most of the RFI, allowing initial maps of reasonable quality to be made. With 2 polarizations, and an array size comparable to or smaller than the VLA (27 antennas) this is a lengthy but manageable task for individual projects. The prospect of extending this technique to the planned large-N arrays of the future is daunting if not impossible, and improved automated schemes that can eliminate the need for this sort of time-intensive hand editing must be developed if these instruments are to reach their full potential.

[22] After the initial spectral excision, averaging to reduce the number of frequency channels, and some self-calibration corrections to ensure good phase stability at the source position (at low frequencies the ionosphere phase contribution makes this a necessity), a second round of data editing is often useful to identify any remaining RFI, and particularly to identify any baselines which should be entirely removed. There are many ways to view the data at this stage, although it is almost always best to order the data by baseline length (or UV distance) to minimize the chance of accidentally flagging real source structure.

[23] One of the best ways to identify discrepant data is to look at the RMS of the amplitude over a short length of time (several integration intervals) on a given baseline at a given frequency. Another powerful tool is to examine the amplitude of the vector difference between the fringe visibility at a given time and the vector average of the visibilities in a surrounding short time interval on a given baseline at a given frequency. Both of these displays will highlight discrepant data, while minimizing the chance of excising real source structure to which they are relatively insensitive (see Figure 5). Finally, it is also very useful to look at the difference between the two circular polarizations if both have been recorded, because while RFI is often polarized, astronomical sources are expected to be largely unpolarized at low frequencies. As with the spectral excision method these techniques are time consuming at current array sizes and data rates, but not impossible to complete; it is not clear that they will be practical with planned future arrays.

Figure 5.

Snapshots showing the amplitude and RMS noise values in a single-frequency channel, with baseline length, or UV distance, increasing to the right and several hours of time increasing up along the vertical axis for a couple of hours. Data were taken with the VLA in the A configuration on 15 August 1991 and were borrowed from the VLA archive. The gray scale indicates (left) amplitude and (right) RMS, with white corresponding to the highest amplitudes or RMS. Notice the two prominent baselines close together in the middle of the image on the right (RMS); they can be seen to oscillate rapidly in amplitude by a very close inspection of the amplitude image but are very difficult to identify. Thus, by displaying the RMS noise of the data, we are able to more easily isolate bad data.

[24] After the moderate level RFI has been identified and removed, further steps are often desirable to remove low-level RFI from the final image. A good way to identify the source of low-level RFI is simply to average the UV data set in frequency and time. RFI tends to average coherently, while white noise does not, making it easier to isolate RFI in a highly averaged data set, even if it is no stronger than the noise in the unaveraged data. The drawback is that a larger amount of data is excised for each discrepant data point identified.

[25] Comparison to a good model of the field is another way to isolate subtle, discrepant, contaminated data. One must have a very good model of the observed field for this to work, and the data must also be very well calibrated, so it should be a last step (for an example, see Perley [1999]). In some implementations of this method model visibilities taken either from a CLEAN'ed image of the source or from external information about the source, are compared directly to the observed visibilities and nonmatching data are identified and removed. Another option is to either subtract the model from the data, or divide the data by the model, and then excise high-amplitude data points in the resulting data set to eliminate the discrepant visibilities. The model is then added or multiplied back into the data before final imaging. Subtracting a model will tend to isolate RFI on short baselines, while dividing is more sensitive to RFI on longer baselines, but both are useful. Software exists to implement comparisons to the model, however it is usually prudent to examine the results and identify “discrepant” data for excision by hand.

[26] Often, despite careful flagging there may still be subtle RFI effects in the data. Figure 6 shows one of the most common problems, the so-called “polar rings.” Many RFI sources, especially at low frequency, are stationary and have a phase behavior similar to that of a source at the north pole, which is also stationary in the frame of the array. Straight or slightly curved horizontal lines in images, although caused by RFI, are thus equivalent to sidelobes from an un-CLEANed source at the pole, hence the name polar rings. When present they can be extremely difficult to isolate and remove, but since they frequently have amplitudes at or below the noise level in the image, they may be more of a cosmetic issue than a scientific problem. One simple-minded way to handle them is to use an imaging routine which can handle the three-dimensional effects caused by noncoplanarity of an array over large fields of view, and to CLEAN whatever interfering flux appears at the north pole position rather than actually excising the signals, although this seems to have a mixed success rate. Even better, when the RFI is not highly time variable, one could self-calibrate the data to the CLEAN components created by imaging the north pole, and then reimage and subtract the RFI “source” from the data. Once the RFI is subtracted the calibration is inverted, leaving a much improved data set. This would remove the RFI directly without causing a loss of data, and could be implemented automatically, although currently, at least in AIPS, it is necessary for the user to run a sequence of software tasks by hand.

Figure 6.

Image of the astronomical sources 3C129 (J2000 RA = 04:49:09 Dec = +45:00:39) and 3C129.1 at 74 MHz, taken with the VLA. The image is roughly 2.5° on each side, and the gray scale has been chosen to emphasize the nearly horizontal lines spanning this image. Usually referred to as “polar rings” because they are similar to the sidelobes one would expect from a source at the North Pole, these low-level imaging artifacts are a common indication of subtle RFI that remains in the data.

5. Conclusions

[27] We present some examples of RFI seen at 74 MHz and 330 MHz on the VLA and VLBA, and also discuss common methods for dealing with this in postcorrelation data. This overview is given from and intended to summarize the astronomical user experience at these frequencies, and to indicate some of the areas for future improvement. As seen in these data, simply relying on a “clean” portion of the spectrum, or even on very long baselines, will not suffice to completely avoid RFI. Care should be taken in designing new arrays that the instrument itself does not create in-band RFI, similar to that seen with the 74 MHz VLA, because this can be very hard to correct later. As the number of antennas in new radio frequency arrays increases, and the RFI environment continues to worsen, better techniques to mitigate the signals before correlation and remove them automatically after correlation will be needed in order to take full advantage of the power of the planned instruments.


[28] The National Radio Astronomy Observatory is a facility of the National Science Foundation operated under cooperative agreement by Associated Universities, Inc. Basic research in radio astronomy at the NRL is supported by the Office of Naval Research.