The Allen Telescope Array (ATA) at the Hat Creek Radio Observatory (HCRO) is a wide-field panchromatic radio telescope currently consisting of 42 offset-Gregorian antennas each with a 6 m aperture, with plans to expand the array to 350 antennas. Through unique back-end hardware, the ATA performs real-time wideband beamforming with independent subarray capabilities and customizable beam shaping. The beamformers enable science observations requiring the full gain of the array, time domain (nonintegrated) output, and interference excision or orthogonal beamsets. In this paper we report on the design of this beamformer, including architecture and experimental results. Furthermore, we address some practical considerations in large-N wideband beamformers implemented on field programmable gate array platforms, including device utilization, methods of calibration and control, and interchip synchronization.
 Beamforming in a phased-array receiver is the process of coherently combining voltage outputs from many individual, or elemental, antennas (e.g., elements of the array). This addition usually is meant to increase the effective gain of the array compared to a single elemental antenna, thereby increasing the signal-to-noise ratio of the received signal. The benefits of beamforming make it fundamental to the science and engineering goals of multisensor arrays [Van Veen and Buckley, 1988; also Hansen, 1998].
 Several well-known advantages of using phased arrays include the benefits gained from electronic steering, null-forming, windowing, and multibeaming. Beamforming using a large number of small diameter antennas (LNSD) is also cost-effective; for large total collecting areas, an array of small dishes can be made less expensively than a single large dish of the equivalent collecting area [Schultz, 2004], when factors such as the cost of the receivers and processing electronics are appropriately considered. Additionally, the combination of multibeaming and wide-field-of-view elements greatly increases survey speed, a fundamental requirement of survey instruments like the Allen Telescope Array and the Square Kilometer Array (SKA). There is also a long-standing interest in beamforming for radio astronomy, e.g., as relates to the SKA, discussed by Wright et al. , large telescope arrays such as LOFAR [Rottgering, 2003], ASKAP [DeBoer et al., 2009], and MeerKAT [Jonas, 2009], as well as smaller projects, as discussed by Ellingson et al.  and R. Armstrong et al. (A wideband, four-element, all-digital beamforming system for dense aperture arrays in radio astronomy, http://arxiv.org/abs/0910.2865, 2009). While the VLA implemented beamforming in the analog sum [Napier et al., 1983], only recently has the cost of digital electronics enabled the construction of real-time, large-N, wideband digital beamformers of the type presented here. The ATA's three 96-input (48 dual polarization), 104 MHz, 16 bit per sample time domain beamformers are among the first of their kind deployed for operational use in radio astronomy.
 This paper proceeds as outlined. Section 2 presents the relevant design and systems analysis of the beamformer. Section 3 includes the hardware architecture and capabilities. Section 4 presents some software control architecture choices. Finally, section 5 contains experimental data obtained using the beamformer, as well as some operational demonstrations of its capabilities.
2. The Allen Telescope Array and Beamforming Approach
2.1. The Allen Telescope Array
 Three dual-polarization beamformers following the design in this paper are currently in operation at the Allen Telescope Array (ATA), a facility operated by the University of California, Berkeley, and the SETI (search for extraterrestrial intelligence) Institute. The characteristics of this array, including its design and antenna configuration, are well published, most recently by Welch et al. . Therefore, this discussion is limited to the characteristics most relevant to the beamformer implementation.
 The ATA currently consists of 42 offset-Gregorian dish antennas, each having a mostly unobstructed 6 m aperture. These antennas are distributed across the observatory grounds as indicated in Figure 1, covering an array aperture of about 300 m (the distance between the furthest separated antennas in the array). The planned configuration consists of 350 antennas distributed over a 900 m aperture. The unique wide-bandwidth log-periodic feeds and front-end electronics enable each antenna to deliver more than a decade of instantaneous sensitivity (500 MHz to 10 GHz), all of which is conveyed to the back-end electronics using wideband analog single-mode fiber. Each antenna has unique instrumental delay and phase characteristics, imparted by the front-ends and fiber optics.
2.2. The ATA Back End
 The ATA beamformer back-end, including down-conversion electronics, selects a portion of the 10 GHz tuning range, and provides time domain outputs of the formed beams to user instruments. These include spectrometers, described by Welch et al. , for SETI surveys and the Berkeley-ATA Pulsar Processor, described by Van Leeuwen et al. . Additionally, band-limited outputs generated on a 10 gigabit ethernet interface can be recorded to disk at rates limited by available disk drives. The architecture of the down-converter feeding the beamformers is shown in Figure 2, and includes both analog and digital stages.
 The analog down-converter consists of eight identically constructed down-converters for each antenna, providing four independently tuned dual-polarization intermediate-frequency (IF) outputs for back-end instrumentation. The analog bandwidth is 600 MHz, defined by the first IF filter. The second local oscillator (LO) mixes the first IF down to the second IF of 629.1456 MHz, and also applies one of eight orthogonal square-wave Walsh functions to mitigate the effect of analog cross-talk in cross-correlation results. The second IF output is presently filtered to a bandwidth of 200 MHz, to accommodate the current digital back ends.
 The digital down-converters (DDCs) sample the second IF at a rate of 838.8608 MHz, fs. This aliases the second IF to 209.7152 MHz (one fourth the sample rate). The Walsh functions are removed at this stage, and the digital LO is implemented as a simple sequencing of the values [1, j, −1, −j]. The result is down-sampled to a complex time domain sample stream at 104.8576 MCS/s (mega-complex-samples per second), representing a Nyquist band of 104.8576 MHz. Each sample is represented by one byte for each of the real and imaginary components, for a total rate of 1.677 Gbps (billion bits per second). The digital dynamic range of 48 dB, taken as 6 dB per bit, and accompanying processing load are high compared to that for other beamforming applications (e.g., the 1 bit beamformer described by Tomov and Jensen ), but is necessary to retain fidelity in the presence of radio frequency interference (RFI) at Hat Creek, as described by Bower .
 The clocks represented in the digital subsystem were chosen so that 220 samples, or 1024 frames of 1024 samples, occur in exactly 10 ms, the period of the slowest Walsh function. This design benefits the ATA correlator, but also constrains the design of the beamformer.
2.3. Wideband Time Domain Beamforming
 While the topic of beamforming is well explored, most treatments are limited to the case of narrowband phase-only beamforming. Phase-only beamforming is simple to derive, and is useful as a starting point to the discussion of wideband time domain beamforming.
 For the case of a set of antennas separated on a line by a distance d, as shown in Figure 3, a plane wave incident on the array at an angle θ will arrive at the ith antenna at a time (τi) later than it arrives at antenna 1. For a narrowband signal, the phase of the ith antenna is delayed by 2πf0τi radians relative to antenna 1. This phase offset is preserved through down-conversion, and is predicted and corrected in each voltage stream prior to summing by the narrowband beamformer. Mathematically, for any set of samples in time, the narrowband beamformer performs the calculation
where y[n] is the beamformer output at time sample n, and xi[n] is the sample of the ith antenna at time n. A simplified narrowband beamformer implementing this process is shown in Figure 4a. The vector aH is the steering vector for the array. For the general narrowband array distributed in three-dimensional space, the steering vector is found by
where the steering phase is the expected response for a given pointing angle, and is given by
The values are defined above such that f0 is the frequency of the received radio signal (“sky” frequency), c is the speed of light in free space, ϕ and θ are the azimuthal and polar steering angles, and Xi, Yi, and Zi represent the position of the ith elemental antenna.
 Many variations are possible in the selection of steering vectors. For example, windowing functions can be applied to reduce the sidelobes exhibited by the synthesized beam, and nulling algorithms such as projection nulls can be applied to remove unwanted signals incident from directions other than the steering direction. Subbaram and Abend  present the orthogonal projection method, which was expanded on by Ellingson and Cazemier .
 The narrowband beamformer exhibits pointing errors when the signal bandwidth is not small compared to the delay across the array aperture and the result of (3) changes meaningfully for the frequencies of interest. The phase error for a given bandwidth and time delay is easily calculated in radians by
A small narrowband system with a 1 MHz bandwidth and 5 m aperture exhibits a worst-case error of about ±3 deg referenced to the array center. The expected sensitivity loss due to this error is about 0.03%, as calculated using the method described in section 5.5. For this and similar cases, the narrowband beamformer is acceptable when sensitivity is the controlling metric. In the case of the ATA-42, with a bandwidth of 100 MHz and an array aperture of 300 m, the narrowband solution gives an error up to 100 complete cycles (36,000 degrees of phase error) between the largest and smallest frequencies. The narrowband beamformer is not adequate and the beamformer must apply a unique phase correction to each frequency.
2.3.2. Time Versus Frequency Domain: Computational Complexity
 It is known that wideband beamforming can be applied in either the frequency domain (by applying unique phase corrections to each channel), or in the time domain (by combining a time delay with a phase shift) [e.g., Rennie, 1981]. The exact computational resources required by each approach is nuanced, but a comparative analysis of the expected complexity is useful in deciding which architecture to use. Generally, hardware utilization in a time domain beamformer scales as O(NMB), where N is the number of antennas, M is the number of beams, and B is a measure of the complexity required per-antenna-per-beam; we adopt the symbol B in reference to the fractional-delay FIR filter, which can dominate this term. The trade-offs involved in implementing this filter are discussed further in section 3.3.1.
 Hardware utilization in a frequency domain beamformer using pipelined FFTs depends on the FFT length, CF, and radix, R, and is of the order O(NRlogR CF + NM + MRlogR CF) [Bergland, 1969; also Gold and Bially, 1973], though this considers only mathematical operations, and not memory required to implement the FFTs. The first term is required to convert the data to frequency domain via the FFT, and the last to reconstitute a time domain series of each beam. The middle term represents the simplicity of frequency domain beamforming, which might typically require only one multiply per-antenna-per-beam-per-clock-cycle apart from the FFTs.
 The approximations provided above do not give an exact number of operations required for each beamformer, but it is possible to compare the two approaches to determine cases where one is more appropriate. The approximate complexities are equal when neither beamformer is clearly more efficient, as in
A time domain beamformer might require an FX-style correlator for calibration, but not necessarily as many channels as required for frequency domain beamforming. This is because fractional delay correction is handled by external circuitry, so phase variations between channels are not important. Including the resources required by the calibration FFTs and simplifying yields
where CT represents the FFT length (number of frequency channels) in the time domain beamformer's correlator. A simple analysis, assuming radix-4 FFTs, CT = 128, CF = 1024, B = 7, and N = 42, favors the time domain approach for M = 1 and the frequency domain approach for larger beamformers. However, it is noted that this analysis neglects the increased nonmath (e.g., memory) resources required for the longer FFTs and longer cross-correlators. As a result, the true threshold might be at a larger number of beams.
 The ATA beamformer implements time domain beamforming because of its relative simplicity and computational efficiency for small numbers of beams, although there is interest in frequency domain beamforming for future architectures (R. Armstrong et al., A wideband, four-element, all-digital beamforming system for dense aperture arrays in radio astronomy, http://arxiv.org/abs/0910.2865, 2009).
2.3.3. Time Domain Beamforming
 The time domain process for wideband beamforming, implemented at base band, consists of a time delay and a phase adjustment. The former corrects the geometric time delay in Figure 3, while the latter corrects for the center-frequency phase of (3). The wideband beamformer is illustrated in Figure 4b. The need for both the time delay and the phase correction is seen by the following analysis. First, the time domain outputs from two antennas are defined as
The signal x2 is simply a time delayed copy of x1. The down-conversion process multiplies the signal by an oscillator at frequency −f0 and gives base band signals y, such that
The beamformer applies a time correction to y2, to time advance it by the time t0. This gives us y2′ as
The condition that t0 = τ simplifies the above to
which differs from by the fixed frequency-independent phase e. This phase correction is computed identically to that of the narrowband beamformer using the sky frequency f0 (the center frequency of a double-sideband wideband beamformer). This phase correction is required because the geometric delay is exhibited in RF, but removed in the baseband processing.
 The implementation of time domain beamforming is computationally straightforward, requiring one complex multiply (for the phase correction), as well as buffers and filters to implement the time delay; the exact complexity varies based on the design. The ATA beamformer utilizes full- and fractional-sample delays, phase correction, and calibration circuitry. The implementation of the fractional delay as a FIR filter with arbitrary coefficients enables amplitude correction and limited band-pass shaping.
3. Hardware Implementation
 The ATA real-time beamformer is implemented using many parallel high-speed field programmable gate arrays (FPGAs). The FPGAs selected for the beamformer are Xilinx Virtex II Pros as implemented in the BEE2 (Berkeley Emulation Engine 2), a dynamic reconfigurable computing platform designed by the Berkeley Wireless Research Center (BWRC) and described by Chang et al. , and pictured in Figure 5. Each BEE2 provides five FPGAs, four for beamforming and a fifth for control. The ADCs and DDCs are implemented using BWRC iBobs (interconnect breakout board), and use a design common to the ATA Correlator [Urry et al., 2007]. The BEE2 and its associated design tools enable rapid implementation of FPGA designs without requiring expertise in HDL. The capabilities and limitations of the BEE2 platform lead to some of the important design choices in the practical implementation of the ATA beamformer, as discussed in the remainder of this section, and different design choices might be made using the latest generation of FPGAs, having different levels of I/O, logic gates, and achievable clock rates.
3.2. Data Flow
 The three completed beamformers are shown in Figure 6. As illustrated in Figure 7, they are configured in a corporate architecture, so-named because of the resemblance of the many-tiered structure to organization charts [Hansen, 1998]. A single dual-polarization beamformer for the ATA-42 requires 19 FPGAs from 5 BEE2s, leaving the twentieth available for packetization of the output data to user instruments over 10 Gb ethernet. The FPGAs are synchronously clocked at the sample rate of 104.8576 MHz. The higher tiers shown in Figure 7 are closer to the unprocessed signals from the antennas, while the lowest tier provides the final dual-polarization time domain outputs. Combined input data rates are 140 Gbps for 84 inputs, and the dual-polarization outputs have a total data rate of 3.35 Gbps. Replicating current front ends and data rates for each antenna, the ATA-350 could supply 1.2 Tbps to each beamformer.
 In the corporate architecture of Figure 7, the topmost nodes are called leaf nodes. The output of a leaf node y[n] is given by the inputs xi[n] as
where the inputs are the digitized antenna streams. Beamforming corrections include full-sample delay di, fractional delay as implemented in the filter b, amplitude adjustment Ai, and phase adjustment given by the initial angle ϕi and rate ffr,i. Successive stages are called branch nodes, and have outputs y[n] described by their respective inputs xi[n] as
In this case, the inputs are subbeams from earlier nodes. The bottommost branch node is a modified version of the standard branch node, and is called a circular branch node. It accepts only two inputs, x1, x2 as the x and y polarization synthesized beams, and has two combining modes to generate two outputs y1, y2. When linear polarization is requested, it acts as a pass-through, e.g., yi[n] = xi[n]. When circular polarization is requested, it acts as a combiner, as in
Block diagrams of the three types of nodes are illustrated in Figure 8, the functions of which are discussed further in following sections. The choice of a corporate architecture is simple, and maximizes module reuse; however, it eliminates most intertier communications and places some practical limits on calibration, as discussed further in section 4.
 All data flow between the FPGAs of the beamformer is managed using the 10 gigabit attachment unit interface (XAUI) standard. This standard is easily implemented on the BEE2 and iBob platforms, and works well with a synchronous system. These links include “data” and “out-of-band” signal paths. The former are multiplexed as four complex time series per link (8 bits real + 8 bits imaginary per sample) and the latter are used to transmit 1 s and 10 ms pulses, used as a synchronization reference for the digital system. The synchronization pulses are used in concert with forced-air FPGA cooling and a physical layout minimizing cable attenuation to optimize the quality and stability of the high-speed XAUI links [Armstrong, 2009].
3.3. Leaf Node Beamformer
 With an understanding of the entire system now in mind, we continue to a detailed description of the beamformer processing architecture, beginning with the leaf node. Following the block diagram in Figure 8a, the leaf nodes implement all of the beamforming corrections, and are complete beamformers. The later nodes provide subbeam summing and optional additional processing. As a result, a cascaded beamformer may terminate in any type of node, providing a calibrated beam for all antennas included prior to the termination.
 The leaf nodes are the first type of beamforming node encountered after the DDC. Each leaf node is effectively a self-contained eight-element beamformer, applying all of the required corrections for instrumental offsets, geometric pointing corrections, and custom beamforming coefficients. In addition to these corrections, the leaf node must provide diagnostic and calibration data to the control software. Two XAUI input links provide eight antenna inputs; a single XAUI output link transmits the subbeam and reference antenna used in calibration.
3.3.1. Delay Corrections
 The first leaf node correction is the programmable delay, which is split between a coarse, or whole-sample, delay module and a fine, or fractional, delay module. The coarse delay is implemented with a 1024-sample variable-delay memory buffer and calculated delays (or advances) are applied relative to a default value of 512 samples. This provides a delay range of about ±4.88 μS, which allows both geometric and fiber-optic delay corrections sufficient for the ATA-350 design goal of a 900 m array aperture.
 The fine delay is implemented as a six-tap real-coefficient FIR filter, with 12 bit coefficients generated by the general least squares algorithm [Laakso et al., 1996]. The higher precision of the coefficients (relative to the data) allows accurate filtering and amplitude control (by scaling all coefficients). The filter length of six taps balances filter accuracy (relative to an ideal fractional delay) against device utilization. Longer filters did not fit in the leaf node firmware, but might be implemented in future FPGA platforms having more resources. Figures 9 and 10 illustrate the worst-case amplitude error and worst-case phase error for fine delay filters with varying numbers of taps. The ATA design uses a default bandwidth of 72 MHz (0.7 normalized) and 6 filter taps. This provides a worst-case amplitude error of 8%, and a worst-case phase error of 4 deg; the errors are frequency and specific-delay dependent, with the worst errors occurring near the band edge when a half-sample delay is implemented. The effect of these errors is discussed in section 5.5. The design bandwidth of this filter constrains the usable bandwidth of the beamformer, up to the DDC bandwidth of 84 MHz.
 Aside from increasing the filter length, which is not feasible at the current device utilization levels without further optimization, the fine delay filter accuracy can be improved by reducing the design bandwidth (software-selectable and already seen in the figures) or by alternative methods of calculating coefficients. For example, given that the worst errors are near the band edge, it was found that the 6-tap filter with a design bandwidth of 0.75 and an operating bandwidth of 0.7 resulted in a reduction of errors to 5.5% in amplitude and 2.4 deg in phase. Other techniques, such as genetic optimization of the filter coefficients [after Ahmad and Antoniou, 2006], might yield improved performance, but are outside the scope of this paper.
 The delay values are software updated at intervals of about 3 s. Delay rates on astronomical sources are small enough for this to be reasonable; worst-case sidereal rates along a 900 m baseline are about 0.023 samples per second for the ATA-350 at a sample rate of 104.8576 MHz, or 0.008 samples per second for the ATA-42. The worst phase error occurs at the band edges, and is about 3 deg for the ATA-42; however, most sources and most baselines do not exhibit this worst-case rate.
3.3.2. Phase Corrections
 Leaf node phase corrections are applied in a complex multiplier fed by a complex oscillator, as indicated in Figure 8a. The complex oscillator is expanded in Figure 11. Angle resolution is 10 bits (0.35 deg) to minimize angle quantization errors, which have a greater effect on beamforming than randomly distributed errors. Quantized errors and approaches to correcting them are described by Smith and Guo  and more recently by Jiang et al. . The resulting phasor from the look-up table has a precision of 12 real and 12 imaginary bits, to minimize artifacts arising from quantization and modulation of the primary signal. The choice of 12 real and 12 imaginary bits limits these errors to less than 0.03% in amplitude and 0.02 deg in phase, placing resulting artifacts at the level of −70 dBc. The phase rate has a precision of 31 bits (including the sign bit) and a range of approximately 1.5 μHz to 1.6 KHz. Access to the look-up table is multiplexed among the eight antenna signal pathways to conserve FPGA resources. This results in a hardware update rate of fs/8. At the highest phase rate, the error between hardware updates is less than 0.03 deg.
 Phase rates indicate the frequency with which phase coefficients must be updated to maintain synthetic beam pointing on the source, and are meaningful for sidereal sources observed with instruments like the ATA. Found by differentiating the phase term in (3), these rates are known as the fringe rate in radio astronomy, and are given by
Using ATA-350 values of 220 ps/s and 10 GHz gives a phase rate of 2.2 Hz, although the exact values depend on the source, baseline, and sky frequency. The accumulator, shown in Figure 11, increments at a rate proportional to the value for the unique fringe frequency for each antenna relative to the array origin (indicated as a magnitude ffr with a sign dir in Figure 11). Overall, the circuit is the discrete-time implementation of
Software updates of the rate and initial angle occur about every 3 s. Higher-order derivatives of the phase are sufficiently small as to be effectively insignificant here.
3.3.3. Calibration Correlators
 In addition to beamforming corrections, the leaf-node also conducts measurements to assist in calibration of the beamformer. A thorough discussion of the calibration approach is presented in section 4. For the purposes of the hardware discussion, it is sufficient to know that a wideband, N baseline cross-correlation strategy is used.
 Each signal path of the leaf-node beamformer includes a 128-bin FX-style correlator. This simple correlator is the discrete-time implementation of
where c[f] is the correlated spectra, x[n′, f] is the discrete spectra of the nth′ frame of the antenna signal, y[n′, f] is the same for the reference antenna signal, and k represents the duration of the integration. The leaf-nodes utilize a 128-point fast Fourier transform (FFT) in the calibration correlators; this length balances device utilization against RFI excision and delay range capacity. The reference spectra is software-selectable as either the same antenna itself (producing an autocorrelation) or the FPGA-wide reference antenna, producing a cross correlation. Ordinarily, the best performing antenna (i.e., lowest noise temperature) in a given FPGA is selected as the reference antenna, to provide the best quality calibration. The time domain signal for the FPGA-wide reference antenna is multiplexed with the subbeam in the beamformer XAUI output, to facilitate branch-node calibration.
 These correlators occupy valuable space in the FPGA, and minimizing their footprint is important in the trade-off between calibration time, accuracy, and device utilization. Several methods of improving device utilization include reduced duty-cycle correlators, reduced FFT size, and reduced bit resolution. These each have their drawbacks, and are summarized in Table 1. All are subjects of further investigation. Device utilization is discussed further in section 3.5.
Table 1. Methods, Costs, and Benefits of Correlator Reductiona
Each method listed above can reduce the footprint of the calibration correlators by reducing the number of required computations. Each method also has a penalty, generally reducing the quality of the calibration.
Calibration time increased by O(X)
Reduced RFI robustness and statistical confidence in calibration
 Beamformer nodes following the leaf nodes use the branch node firmware model, which allows the construction of a large, scalable, system from many leaf nodes. The two branch node architectures are shown in Figures 8b and 8c, and are very similar. We will focus on the intermediate branch node of Figure 8b, highlighting differences to the circular branch node where warranted.
 The branch nodes are I/O limited rather than processing limited. This is due to the absence of beam-steering corrections and the limited availability of only four XAUI ports per FPGA, all of which are used to provide interconnections to other nodes. The intermediate branch node accepts three XAUI inputs containing one stream each and produces one XAUI output containing interleaved subbeam and reference data. The circular branch node accepts two XAUI inputs (nominally orthogonally polarized subbeams) and produces interleaved dual-polarization outputs identically on each of the two XAUI output ports. In contrast, the leaf-node accepts two inputs containing four streams each.
 The extra processing capacity in the branch node models enables the addition of more processing elements. The cross correlators are increased to 256 bins for additional spectral resolution, and can correlate either input reference antennas or subbeams. The former leads to a more direct calibration solution, while the latter offers more sensitivity and more complex calibration strategies. A large 16k-channel spectrometer is included in the intermediate branch node, and allows precise, sensitive spectroscopy within the beamformer. The circular branch nodes cannot accommodate two of these spectrometers, so instead pass their time series upstream to the nearest intermediate branch nodes for analysis. This architecture is shown in Figure 8.
 The final stages of signal conditioning in the branch node models are the upconverter and 32-tap Hilbert transform block. For the sake of modularity, these are included in the intermediate branch node as well as on both polarizations of the circular branch node, although they are only enabled in the final node of the beamformer system. These blocks are used when a user instrument utilizes a 52 MHz analog output from the beamformer; the stream of complex time samples must be converted to purely real time samples prior to the DAC. The discrete Hilbert transform [Kak, 1970] is a well known transform for sideband selection in digital signal processing. It converts a complex signal to a real signal consisting of only the upper sideband (f > 0) or lower sideband (f < 0), while rejecting the nonselected sideband. The unity-gain fs/4 oscillator up-converts the spectra prior to filtering by the Hilbert transform; this process centers DC at 26.2144 MHz, increasing the usable analog bandwidth by 44%. A diagram of the up-conversion, Hilbert transform, and DAC process is illustrated in Figure 12.
3.5. Device Utilization
 FPGA resources are a limiting factor in the design of this beamformer, and must be considered in any similar system. FPGAs have several different types of resources that can limit design; a few of these include logic slices, memory blocks, and multipliers. A summary of the resource utilization for the three types of beamformer nodes is contained in Table 2. The leaf nodes are currently slice-limited, and have been optimized to meet timing and compilation requirements in the current design. It is noted that the leaf nodes are not close to exceeding multiplier capacity, further optimization might enable increased capacity. The branch nodes are not yet resource-limited. The 16k-channel spectrometer increases memory use in these designs, and the requirement for two Hilbert transforms in the circular branch node (one for each polarization) leads to increased multiplier use over the intermediate branch node.
Device utilization is reported by the Xilinx compilation tool. The figures above include the relevant limiting resources for the beamformer design. Unrelated slices indicate a very full design that may require optimization to meet timing and compilation requirements.
Slice flip flops
Multiply18 × 18 s
4. Software Control Systems Architecture
 The control software ties together the operation of the various nodes of the beamformer, and is distributed between the five BEE2s and a central server. The BEE2s run Borph Linux [So, 2007] on a PowerPC embedded in the fifth “central” FPGA. This allows software access to the control registers of each beamforming FPGA; however, its slow speed renders it unsuitable for high-level operations, so the BEE2-based software is limited to “register server” functionality.
 The central server accesses all of the BEE2s, and is written in Ruby 1.8, a very abstract and modular language first released by Matsumoto in 1995 [Thomas, 2005]. The high level of abstraction enables a two-model scheme for mapping and ownership of resources. In the first model, resources are “owned” by their hardware objects, allowing effective addressing and command. This model is used whenever direct access to the hardware is required, such as during the register updates, which occur every 3 s.
 The second model is hierarchical, and resembles the abstract corporate beamforming architecture. It lacks specific separation of hardware resources, and is better suited to the calculation and propagation of pointing, diagnostic, and cascaded commands within the system. A simplified version of this architecture is shown in Figure 13. At start-up, configuration files provide the relevant mapping between the two models.
 At a deeper level within these models, the beamformer implements a multilayer data abstraction model, providing distinct stages between user intent and hardware implementation. As an example, pointing data are processed through stages of azimuth and elevation (user level), to geometric pointing values (delay and phase), to calibrated pointing values (delay and phase), to abstract register values, and finally to physical register assignments. This provides flexibility to the control system, allowing modification to any level independently. For example, null-forming is accomplished by use of the appropriate code in the geometric pointing stage, and does not require knowledge or modification of the other layers. This permits a large amount of flexibility in increasing the beamformer capabilities at a later date.
 The calibration scheme for the ATA beamformer was introduced in section 3 with the discussion of the N baseline correlators. Calibration of a wideband beamformer is an inherently challenging task. The calibration solution must use readily available sources, characterize the wideband performance of the system in a reasonable amount of time, and balance these requirements against device utilization devoted to calibration. In this design, calibration is implemented by cross correlation using the minimum number of baselines required to obtain a unique solution. In keeping with the practices of radio astronomy, calibration is performed on wideband well characterized astronomical radio sources. Sources are limited to those that are not spatially resolved by the synthetic beam, allowing the use of a point-source model for calibration. Calibration using satellites was considered but they transmit over limited frequency ranges and their positions are generally not well characterized to the level required for calibration.
 The role of control software in the calibration process is extensive, as the beamformer software and firmware determine the instrumental amplitude, delay, and phase offsets of each input antenna. First, the reference antenna for each node, against which all other antennas in that node are to be measured, is determined. Typically, this is the antenna known to have the lowest system noise temperature, as determined by other methods, and is selected using a look-up table. Next, the software commands the hardware to perform a cross-correlation cycle, and then reads back the resulting spectra. Third, the software analyzes the spectra and performs automated RFI excision, flagging data further than 3σ from the mean of each baseline. Next, the software determines the relevant offsets of each signal from its reference. Finally, these offsets are propagated to the calibration values for the relevant antenna objects within the model.
 If calibration values are gathered at several tuning frequencies, the beamformer control software constructs a frequency-dependent calibration model for the system. The instrumental phase is particularly sensitive to changes in tuning frequency. A strong linear dependence of phase versus frequency arises from ambiguity in the location of the instrumental delay when implementing (10). Instrumental delays can occur at both RF and IF stages, and the split between these two determines the linear phase slope when the tuning frequency is changed. Nonlinear variations with frequency arise from differences in the analog electronics. Typical variations of the instrumental phase versus instrument tuning are shown in Figure 14; for various antennas, the instrumental phase varies as much as 60 deg (from the linear trend) over 1 GHz. Knowledge of the instrumental phase model is important to observing strategies that require rapid retuning of the beamformer without recalibration, such as is the case with SETI observing.
 It is noted that these calibration processes must occur in near real time; this is a departure from traditional radio astronomy, in which calibration data is recorded and applied to observations at a later date (although the paradigm is gradually moving toward real-time imaging and live calibration, e.g., as described by Mitchell et al.  at the MWA, and also Keating and Barott  at the ATA). In the real-time beamformer, accurate calibration must be applied at the time of observing, and there is no opportunity for postcorrection. The interval of recalibration varies based on the characteristics of the observation; calibration intervals of once per hour may be required for C band observing and above with degree-level accuracy, while L band solutions are good to within several degrees for many hours. Figure 15 indicates phase stability of several antennas on the radio calibrator 0927 + 390, taken every few minutes for almost 2 h at 4500 MHz. Despite the wide error bars in Figure 15, the trends of about 2.3 deg per hour are consistent with other observations.
5. Engineering Validation and Results
 Following successful engineering tests, the ATA beamformer was commissioned for operational use in 2007. The second beamformer and circular polarization synthesis were added in 2008, and the third beamformer in 2009. Some results of the validation and verification tests are included in this section. These tests include beam- and null-forming, pattern measurements, array gain, and evaluations of calibration consistency during observation campaigns.
5.2. Array Gain
 The fundamental purpose of the beamformer is to coherently add signals from multiple antennas to increase the signal-to-noise ratio (SNR) of the output beam; the first two measurements were designed to confirm this functionality. In the first test, a known test pattern was injected into each signal path, while in the second test the beamformer was directed at the bright 6.6 GHz methanol maser in W3OH. In each case, inputs were switched on and off to vary the number of antennas used in the output beam. An analysis of signal strength versus the number of antennas is shown in Figure 16. Spectra for the W3OH measurements are included in Figure 17. These tests indicate that the beamformer functions as expected. Variation from the ideal curve is due to quantization in the lab-source measurement, and unequal antenna sensitivities in the W3OH measurement.
 As an additional test, on 12 July 2008, the beamformer was directed toward the weak 8.4 GHz carrier from the Voyager 1 spacecraft, then at a distance of 106 AU from Earth. The beamformer output was directed to the SETI spectrometer, which successfully detected the carrier, which could not have been detected without the collecting area of the ATA and accurate high-frequency calibration. Additional data from the Voyager observations were recently published by Welch et al. .
5.3. Beam Patterns
 Another useful characteristic in evaluating the beamformer is the radiation pattern of the synthetic beam. The synthetic beam width is several arc minutes, and varies based on observing frequency, subarray configuration, and pointing direction. Beam patterns were generated by sweeping the synthetic beam across the geostationary satellite Galaxy 15, and measuring its 1575 MHz L1 downlink for the GPS Wide Area Augmentation System (WAAS). Because of the uncertainty in the position of Galaxy 15 relative to its predicted (two-line element) position, the beamformer was calibrated on Galaxy 15 itself regularly throughout the sweep. This minimized the effect of arc minute level ephemeris errors, enabling the production of high-fidelity patterns. The received signal is at least 10 dB above the noise floor even in pattern nulls, indicating that these measurements accurately identified the null-depth of the pattern. The primary beam of each antenna was directed at the predicted satellite position throughout the sweep, so the primary beam pattern did not influence these measurements. This series of measurements highlights the ability of the beamformer to create time domain beams on any location within the field of view of the primary beam.
 The first of these patterns is shown in Figure 18 and is a horizontal (azimuth) slice using 14 antennas. Two patterns are plotted. The first is the pattern of the coherently phased beam, with phase coefficients calculated as in (3), and indicates a beam width and sidelobe level appropriate to the number of antennas and array aperture distribution. For the second pattern, three nulls were placed at 10.8, 11.4, and 12.0 arc minutes off of the primary pointing direction. The projection nulling method described by Subbaram and Abend  was used to calculate the phase coefficients. Projection nulling is a standard feature of the ATA beamformer, and is used to create orthogonal beam sets in some observing modes (so that every beam serves as an “off-point” for every other beam). Sensitivity in the nulled region is reduced to the −30 dB level, an improvement of 20–24 dB from the original beam. The main-lobe sensitivity is reduced by 3.2 dB, as expected given the original sidelobe strength; the net discrimination between the null and main lobe was improved by the difference, or 16.8–20.8 dB.
 The high sidelobe levels in Figure 18 are attributable to the small number of antennas (14) used in the measurements, and agree well with predictions. The pattern of the complete array will exhibit significantly lower sidelobes as the array aperture is better filled. Harp  provides a more detailed discussion of the expected nulling capabilities of the ATA-350. Additionally, it is observed that the main lobe of the null beam is of similar strength to the sidelobe at −10′, resulting in poor discrimination between sources in the main lobe and this sidelobe. However, it is assumed that a priori knowledge of the target field is used to select the null position, e.g., that the source of interest is in the main lobe, a potential interferer is at the null, and the rest of the field is unpopulated. In practice, if the ATA field of view contains many strong, potentially interfering sources, each might be nulled independently to ensure that the remaining field is unpopulated. While windowing functions can improve the sidelobes exhibited by an array [e.g., Ellingson and Cazemier, 2003], they are not implemented in the ATA beamformer.
Figures 19 and 20 contain the analytically predicted and measured two-dimensional scans of the radiation pattern including projection nulls. While there are some minor differences between the predicted and measured patterns, there is very good agreement in null and sidelobe positions and strengths, as well as the shape of the main lobe of the synthesized beam.
5.4. Stability and Reliability
 One further metric of the beamformer performance is its ability to reliably obtain useful calibrations during automated observing. If the calibration is poor, then the beam will become incoherent and nulls will be shallow. Self-diagnostic estimations of calibration confidence were gathered for 19 weeks from May through October 2009, during regular automated SETI observations using two beamformers. These values indicate the beamformers' statistically estimated confidence in the phase calibration, and tend to be overestimators of the phase error. For each baseline, the confidence value is calculated by
where ϕσ is the one-sigma confidence interval of the phase, σ and μ are the standard deviation and mean of the complex cross-correlation results, and N is the number of frequency channels evaluated in the cross correlation.
 These confidence values are plotted in Figure 21. In most cases, Beamformer number 1 obtained a mean confidence within two degrees of phase. Beamformer number 2 was somewhat worse at about five to ten degrees of phase. It is noteworthy that these calibrations used the same antennas and were performed nearly concurrently and it is expected that the two beamformers should obtain calibrations at the same level of accuracy. Differences may arise from slightly different DDC designs used by the two beamformers. The impact of these differing designs has not been fully evaluated, but it is known that they can affect the calculated confidence level without necessarily worsening the actual calibration.
5.5. Beamforming Error Budget
 Several sources of error in the beamforming process have been introduced in this paper. High-accuracy amplitude and delay calibration are relatively easily obtained; after improvement of the fine delay filter to reduce amplitude errors (a process currently underway), the most significant source of beamforming error will be phase error. A summary of the expected phase errors from various systems is shown in Table 3.
 Phase errors during beamforming will reduce the gain of the array in the intended direction (both by redirecting the peak of the synthetic beam and by broadening the synthetic beam), raise the sidelobe level, and reduce the depth of steered nulls. If the phase errors are small and uniformly distributed, gain reduction is estimated from Hansen  as
where σ = max (∣ϕerr∣)/3, with the phase error in radians. If the phase errors in Table 3 are taken to be random and uncorrelated, Hansen predicts a sensitivity loss of less than 1%. However, phase errors such as those from the fine delay filter are deterministic and periodic with time as the array is scanned, and may have a greater impact on the sensitivity of the resulting beam by amplitude modulating the array pattern. The effects of errors in station beams on cross-correlation imaging is particularly relevant to the SKA and described in detail by Wright and Corder , but is outside the scope of this paper. The worst-case increase in sidelobe level is approximated after Barott  as
for the case when a null is formed by a uniform distribution of the phasors from each antenna and
when a null is formed by antenna phasors split in two groups π radians apart. The original sidelobe level is given by SLL0, and the sidelobe levels in dB are found by SLLdB = 20 × log10 (SLL). The phase errors described in Table 3 lead to an expected null-depth between −36 dB and −20 dB, depending on the errorless phasor distributions at the null. Although this range is necessarily large, it is consistent with the null-depth of about −30 dB observed in Figure 18. Harp  concluded that nulls better than −40 dB will be achievable in the ATA-350 for the amount of phase error described in Table 3.
 Three time domain beamformers are deployed and operational at the ATA, and serve a variety of back-end instruments in operational capacity. Most of the challenges encountered in the design stem from the limited device resources in the FPGA, and the distributed nature of the instrument.
 Ongoing work includes optimization of the FPGA design to improve precision and provide additional capabilities. This includes lengthening the fine-delay FIR filter, as well as reducing correlator device requirements without sacrificing calibration quality. Software requirements are aimed at improving calibration by exploring subbeam calibration and other calibration strategies (for example, the SUMPLE algorithm described by Rogstad ), as well as exploring optimal fine-delay coefficients [after Ahmad and Antoniou, 2006]. Calibration might also be improved by extending the calibration processing beyond the point source model, allowing the use of stronger, but spatially resolved, calibrators.
 Despite the computational simplicity of beamforming, the need for calibration correlators highlights the inevitable merger of real-time beamformers and imaging correlators for radio astronomy. The requirement is not altogether obvious when the beamformer is considered as a predominant back-end independent of correlation imaging, such as is the case with beamformer science at the ATA. This approach is planned for the next generation of instruments (e.g., the Correlator-Beamformer-Imager, or CoBI, planned for the ATA). These instruments may also utilize frequency domain beamforming, which is computationally simpler than time domain beamforming when high-resolution spectra are already computed for calibration. In addition, the emerging generation of FPGA processors exceeds the capabilities of the those in the ATA beamformer. For example, the ROACH platform (described by A. Parsons et al. (A scalable correlator architecture based on modular FPGA hardware, reusable gateware, and data packetization, http://arxiv.org/abs/0809.2266, 2009)) achieves the processing capability of a BEE2 in a single FPGA, while realizing significant reductions in cost, power, and space requirements.
 Special thanks to the members of the ATA Team and the Center for Astronomy Signal Processing and Engineering Research (CASPER) group and collaborators for their support over this multiyear effort. This research was funded by the National Science Foundation, awards 0321309 and 0540599. The ATA Team members affiliated with the SETI Institute, Mountain View, California, USA, include Robert F. Ackermann, Shannon Atkinson, Tucker Bradford, Mike Davis, Dave DeBoer, John Dreher, Gerald R. Harp, Jane Jordan, Susan Jorgensen, Ken Smolek, Tom Pierson, Karen Randall, John Ross, Seth Shostak, and Jill Tarter. The ATA Team members affiliated with the Radio Astronomy Laboratory, University of California, Berkeley, California, USA, include Don Backer, Amber Bauermeister, Leo Blitz, Douglas Bock, Geoffrey C. Bower, Calvin Cheng, Steve Croft, Greg Engargiola, Ed Fields, Rick Forster, Carl Heiles, Tamara Helfer, Colby Gutierrez-Kraybill, Garrett Keating, Casey Law, Joeri van Leeuwen, John Lugten, Peter McMahon, Andrew Siemion, Douglas Thornton, Lynn Urry, Jack Welch, Dan Werthimer, and Peter Williams.