Whole ecosystem carbon dioxide (CO2) exchange estimated with the eddy covariance (EC) technique has been central to studies on the responses of terrestrial ecosystems to disturbance and intra-annual and interannual variations in climate, but challenges exist in understanding and reducing the uncertainty in estimates of net ecosystem exchange (NEE) of CO2. We review the potential uncertainties associated with the eddy covariance technique, including systematic errors from insensitivity to high-frequency turbulence, random errors from inadequate sample size associated with averaging period, vertical and horizontal advection issues, and selection criteria for removing periods of inadequate mixing from further analyses. We also discuss benefits and caveats of using independent measurements to evaluate EC-derived NEE, such as comparisons of EC-derived annual NEE and allometric net ecosystem production estimates (NEP) and interpretation of nighttime NEE with scaled chamber-based estimates of ecosystem respiration.
 The EC technique was pioneered over grass and croplands with long fetch and short roughness lengths [Kaimal and Wyngaard, 1990; Verma et al., 1989; Lemon, 1960; Monteith and Szeicz, 1960]. Researchers have since applied this approach over structurally complex ecosystems in nonideal terrain, introducing new challenges in the interpretation of results and reduction of uncertainties. The EC technique is a direct, nondestructive micrometeorological approach derived through the simplification of the conservation equation [Baldocchi, 2003; Baldocchi and Meyers, 1998; Baldocchi et al., 2000; Shen and Leclerc, 1997]. EC is used to estimate NEE through the addition of above-canopy turbulent exchange and the change in CO2 storage in the canopy air space (i.e., the temporal change in carbon concentration integrated from ground level to the point of measured turbulent exchange, term I, equation (1a)):
where c is the scalar quantity such as CO2, u, v, and w are horizontal, cross-wind, and vertical windspeeds (m s−1), respectively, x, y, and z are Cartesian coordinates, t is time, and Z is the measurement height (m), the overbars indicate a time average and prime denotes turbulent fluctuations (i.e., deviations from a mean quantity). Term I in equation (1a) represents the time rate-of-change of c in the vertical column (i.e., storage), and is considered to be equal to zero over a 24-hour period, but can be significant over shorter time intervals. Terms II–IV represent the turbulent flux divergence. Terms V–VII represent advection through the layer between the surface and sensor. The partial derivatives usually are estimated from data at points in the domain, and thus are replaced here by finite differences. Assuming that the measurement height is sufficient and the surface characteristics are horizontally homogeneous, terms III and IV are often thought to be 0 and ignored. Terms IV–VII are inherently difficult to measure well, but thought to be small and thus not often estimated. Hence only terms I and II remain. Term II in equation (1b) is commonly referred to as the eddy covariance (additional assumptions in its estimation are discussed later). NEE is typically estimated over a 30-min period from high-frequency measurements, and integrated further to estimate daily, season and annual fluxes. The relative importance in the sources of error differs across timescales, spatial extent and site conditions. For example, for NEE at minute-to-hour timescales (compare with photosynthetic uptake and respiratory efflux), systematic and random errors associated with instrumentation precision, calibration and placement, and turbulent transport become significant. In scaling NEE from day-to-season (months), random errors are reduced, but other errors are added because of filtering and filling gaps in data. Consequently, the sources of error in applying EC to understand rapid biological response to meteorological variables are quite different from those errors associated with determining the environmental controls on phenology, annual carbon exchange, effects from disturbance, or whether mature forested ecosystems continue to accumulate carbon. Of course the relative contribution of all errors becomes large in either example (i.e., minute-to-hour and day-to-season) when the mean flux is close to 0.
 Some of these errors or uncertainties stem from flows not fully accounted for within and below the sensor field, which can result in potential violation of the assumptions. Other uncertainties result from the stochastic nature of turbulence. A key stratagem of the long-term flux network, AmeriFlux, is to assure accurate estimates within and among flux sites for synthesis activities and regional analysis. The AmeriFlux science plan has outlined rigorous quality control protocols for within site measurements (see http://public.ornl.gov/ameriflux/About/scif.cfm) to avoid large systematic biases so that subtle spatial and temporal trends may be discerned. Quality assurance across the network is assessed through direct comparison of software routines and instrumentation. Using an independent raw data file developed by the Euroflux and AmeriFlux networks (“gold files” for closed- and open-path infrared gas analyzers can be found at http://public.ornl.gov/ameriflux/standards-gold.shtml), researchers can process flux data sets through their own software routine and check estimates against a standard processed file. Site-to-site differences among instrument configurations are estimated by an independent portable flux measurement system that visits each site. Consistency and rigor in sample design, analysis, diagnostics, and data quality checks help to ensure data are comparable across sites.
2. Challenges in Measurements, Analyses, and Interpretation of EC Fluxes
2.1. Flux Measurement
 Estimates of the total CO2 flux from a vegetated canopy should be made in the surface layer, above the roughness sublayer, the depth of which changes with stability [Raupach, 1991; Raupach and Thom, 1981]. In the surface layer, sometimes referred to as the inertial layer, Monin-Obukhov similarity theory applies in homogenous stationary atmospheric conditions [e.g., Kaimal and Finnigan, 1994]. The level close to the surface where the winds asymptotically become zero by extrapolating Monin-Obukhov similarity theory downward is d + z0, where d is the zero-plane displacement and z0 is the characteristic roughness length [see Panofsky and Dutton, 1984]. Combined, d and z0 determine the bottom boundary of the surface layer and d can vary with changes in stability [Loescher et al., 2003]. Consequently, d and z0 are not easily estimated for forest canopies and are not formally defined for below-canopy environments. Estimates of z0 range from 0.01 m for grass [Hansen, 1993] to >2.4 m over tropical forests [Loescher et al., 2003].
 The use of eddy covariance relies on conditions which induce turbulence in the surface layer (i.e., shear stress, surface heating), and sometimes these conditions are weak or absent. Nighttime conditions can be particularly problematic as surface heating is absent and radiational cooling produces stable thermal stratification of the air column that acts to suppress sheer induced turbulence. The measurements above the canopy may become partially decoupled from the surface and the computed fluxes become sensitive to the method of calculation. Researchers have, by necessity, found that empirical relationships between the friction velocity (u*) and scalar fluxes can determine conditions that can assist in the estimation of robust turbulent exchanges in 30-min averages. The u* is normally defined as [cf. Weber, 1999],
 Examining the sensitivity in annual integrals to different u* thresholds was introduced by Goulden et al.  and is now a common method to determine “well-mixed” conditions [cf. Aubinet et al., 2000; Gu et al., 2005]. The impact of accepting data on the basis of different u* thresholds is illustrated in annual integrals from Howland Forest, an AmeriFlux site, where the annual sums do not change above a u* threshold of ∼0.25 m s−1 (Figure 1a). At other sites, a u* threshold however, may not always be as evident. Using 30-min-averaged data for example, the nighttime EC flux continued to increase with increasing u* values from a structurally simple and topographically flat cocoa plantation (Figure 1b), suggesting inadequate mixing to fully estimate the respiratory losses using 30-min-averaged turbulent data.
 Presently, 80% of flux sites surveyed report the use of a u* threshold criteria for acceptance of nocturnal data with values ranging between 0.0–0.6 m s−1 (Table 1). The number of nocturnal 30-min periods excluded from further analysis by a u* threshold vary from site-to-site. An extreme example is found in the tropics where Miller et al.  report that as much as ∼80% of nighttime data are removed depending on the value of the u* threshold. The choice of the threshold value in this case resulted in a ∼5 t C ha−1 y−1 range in annual NEE. The flux community has developed no standardized method to determine u* thresholds, although a statistical approach was developed by Gu et al. . Use and applicability of a u* threshold to determine carbon flux remains an active area of research, but the general concept of restricting analysis to periods with sufficient atmospheric mixing is sound.
Table 1. List of Studies That Discuss the Relationship Between u* and Annual NEEa
NEE is reported in units of t C ha−1 y−1, and u* is reported in m s−1.
Included a vertical advection term following Lee et al.  and scaled chamber measurements to estimate Re.
Seasonal fluxes only.
Used data from nights when u* > 0.4 for at least 2.5 hours and had storage flux for 1 month.
Used sigma u criterion to develop a temperature response, then modeled Re.
No correlation found between NEE and u*.
Assigned uncertainty of 25%, some of the integrated flux (DOY 145–265); the data was filled with a functional relationship using u* > 0.4.
 There are different sources of systematic errors in EC measurements. Systematic errors occur as a result of instrument response time, pathlength averaging, and physical separation between sonic anemometers and the measurement system for CO2 or other trace gases [Massman and Lee, 2002]. In addition, selective systematic errors can result from both the exclusion of high-frequency turbulence due to low-pass filtering (used to eliminate noise) and the exclusion of low-frequency flows due to the length or method of averaging time [Mann and Lenschow, 1994; Foken and Wichura, 1996; Vickers and Mahrt, 1997]. For example, the transport of carbon dioxide due to slowly moving boundary layer eddies (i.e., conditions with weak winds and significant surface heating), may be excluded by traditional averaging periods [Sakai et al., 2001]. The overall effect is analogous to inadequate spatial averaging, as is the case with using too short of an averaging distance to define the turbulent fluctuations in aircraft observations [Betts et al., 1990; Desjardins et al., 1997]. Systematic errors occur within each of the averaging periods (i.e., 30 min) and are present in the EC term (equation (1b)) and scale in the same proportion with longer time integrals (i.e., 1 year, Table 2). Systematic errors associated with tilt corrections, flux divergence and advection between the surface and EC measurement level can exist because of inaccurate definition of the flux field, or when particular conditions arise causing one or more of terms III–VII (equation (1a)) to be not equal to zero, and these errors may not become apparent until data are examined across longer time intervals, from day-to-season-to-year.
Table 2. Different Sources of Error in 30 min and Annual NEE Estimatesa
Source of Error
Systematic Error, 30 min
Random Error, 30 min
Time-Integrated Error for Annum Estimates
Notes, Annual NEE t C ha−1 y−1
The errors attributable to gap filling were not available for estimates of total uncertainty (except as noted).
Constrained by soil respiration estimates.
Error of ∼1/2 due to errors in the respiration estimate.
 The partitioning between turbulent and nonturbulent motions is influenced by the coordinate rotation method used to align the sonic anemometer with respect to the surface [Wilczak et al., 2001] and can lead to systematic errors. While investigators have been attempting to correct for possible sonic misalignment for at least 25 years, the methods for such corrections have been detailed in the refereed literature only recently. One method, planar rotation [Wilczak et al., 2001], was developed in the late 1970s by Steven Stage. For nighttime stable conditions, rotating the coordinate system for individual (e.g., 30 min) records can lead to erratic results [Finnigan et al., 2003]. Applying new rotation angles for each 30-min record to eliminate the cross-wind Reynolds stresses cannot always be justified for such nocturnal conditions. Several alternative methods have been developed which compute rotation angles across longer or ecologically significant periods (month-season-annum). Some of these methods are classified by Paw U et al.  and discussed in extensive detail by Wilczak et al. . Alternatively, Lee  rotated the coordinate system for the entire data sample, independently for each wind direction group. This approach tends to better capture the directional influence of slopes on turbulence, and the directionally dependent flow distortions associated with the instruments, mounting brackets and so forth. This planar fit method however, is sensitive to remounting of the sonic anemometer, which will result in the necessary calculation of new rotation angles.
Mahrt et al.  compared different coordinate rotation methods for (carefully aligned) sonic anemometers, and found that differences were insignificant, except for the estimation of momentum flux under weak wind conditions (<1 m s−1) or for scalar fluxes under very weak winds (<∼0.25 m s−1). Such conditions would typically be eliminated by the u* threshold discussed previously. They recommended a slightly modified version of the rotation scheme used by Lee . For misaligned sonic anemometers, more serious errors of ∼14% and 3% per degree tilt were found for momentum and scalar fluxes [Moncrieff et al., 1996].
 Instrument-induced flow distortions may be larger below canopy because the vertical velocity is not small compared to the total horizontal velocity, i.e., large attack angles. Careful examination of coordinate rotation methodologies is needed on below-canopy fluxes and should also take deviation in flow by trunks and branches into account, particularly for sites located on challenging topography.
2.1.3. Random Errors
 Sources of random error differ in flux estimates made within an averaging period (e.g., 30 min) and those associated with adding averaging periods together to achieve estimates across longer time integrals, that is, random flux errors generally decrease with increasing record length. The random flux error that Moncrieff et al.  refers to is the variability among averaging periods, which includes both the random error that is added from averaging period to averaging period and the variability due to synoptic changes in ambient conditions. Diurnal variations were filtered out of their calculation. Sources of random error within an averaging period result primarily from an inadequate sample size [Hollinger and Richardson, 2005; Kruijt et al., 2004; Finkelstein and Sims, 2001; Vickers and Mahrt, 1997; Mann and Lenschow, 1994; Lumley and Panofsky, 1964]. This does not mean to imply however, that sampling more frequently (e.g., from 10 to 50 Hz) would necessarily reduce the effects of random error. Needed instead, is the ability to sample fast enough to resolve all transport scales, but also long enough to adequately sample the infrequent large turbulent motions that transport flux. The Allen variance can be used to assess the relative contribution of instrument random and systematic error under static conditions and can assist in determining an appropriate averaging time to reduce random error [Allen, 1966; Barnes and Allen, 1990] [cf. Bowling et al., 2003; Loescher et al., 2005] (Figure 2). This is strictly true only if the random error is normally distributed which is apparently not the case for (dynamic) flux data [Hollinger and Richardson, 2005; Richardson et al., 2006]. In the case of a closed path sensor, longer averaging times used to reduce random error can introduce systematic error (e.g., sensor drift), which makes frequent calibrations and instrument stability (i.e., pressure control and temperature regulation) increasingly important to reduce uncertainty. With regard to instrumentation noise, the random error generally becomes small with averaging times >15 min while the effects from systematic error scale among different time integrals.
 Unfortunately, increasing the averaging period to capture a large enough sample size also has the potential to increase the overall contribution of nonstationarity of the ambient environment, which in turn, can bias the computed fluxes [Lenschow et al., 1994]. Techniques for assessing nonstationarity in geophysical turbulence are given by Gluhovsky and Agee , Foken and Wichura , and Mahrt .
 To reduce random errors, two averaging times are used to compute the turbulent flux, such that,
where s is the measured scalar quantity, and the angle brackets refer to the averaging operator that defines the turbulent fluctuations (noted by the prime). The corresponding perturbations ideally include all of the turbulent motions but exclude influences from mesoscale motions. The vertical motion at lower frequencies cannot be adequately measured by sonic anemometers, although it is often thought that their overall contribution to the total surface flux is small and can hence be ignored. Inadvertent inclusion of mesoscale motions however, can lead to large random flux error [Vickers and Mahrt, 2006a; Sun et al., 1998]. Such contamination most likely occurs with stable conditions in the presence of gravity waves, meandering motions, nonstationary drainage flows and other nameless mesoscale motions of unknown origin [see references in Mahrt et al., 2001b].
 The vertical flux is then computed as 〈w′s′〉 where the angle brackets represent the averaging time on a longer timescale to reduce the random flux error (i.e., averaging the products of perturbation). The longer averaging time (30 min to 1-year scale) is sometimes referred to as the “flux averaging time” and leads to the NEE estimates. Both the overbar and angle brackets must be chosen as simple unweighted averages, particularly to satisfy some averaging schemes [e.g., Reynolds, 1901]. Sometimes the averaging period in equation (3) and that associated with the overbar (compare with equation (1b)) are the same, in which case, random flux errors may be large. For example, the largest turbulent motion included in equation (3) would be of the same timescale as the flux averaging time, and therefore only one sample of this motion would be captured. Such errors can contribute to large variability in process-oriented studies that use 30-min averages, but are less important when averaged fluxes are summed together for longer time periods, as in the calculation of seasonal and annual NEE estimates, or when inferring relationships such as light response to daytime NEE with a large number of 30-min average data sets.
 At times, the distinction between turbulent and nonturbulent motions is difficult to detect and may overlap in scale. The inspection of integrated cospectra (ogives), however, can assist in determining the temporal scale for transport of carbon [Friehe et al., 1991; Oncley et al., 1990]. On the basis of examination of a large number of multiresolution cospectra, under a variety of conditions, Vickers and Mahrt [2006a] demonstrate that the ideal averaging time for defining the turbulent fluctuations (equation (3)) decreases with increasing stability. However, they also show that averaging times increase with height above the canopy because of increases in eddy size with height. Hence the use of variable averaging times seeks to minimize the influence of random errors in very stable conditions due to inadvertent capture of mesoscale motions and to minimize systematic flux loss in unstable weak wind conditions due to inadvertent omission of slowly moving large eddies. Use of variable averaging times, however, may be impractical for routine calculations, and the applicability of this approach to below-canopy flux calculations is not known. In many cases however, it is necessary to characterize the absolute or relative magnitude of random flux measurement errors. Maximum likelihood and Kalman filter approaches, for example, require such estimates.
2.2. Flux Divergence Below the Measurement Level
 Environmental conditions that contribute toward flux divergence (i.e., terms II–IV ≠ 0, equation (1a)) often also affect storage flux (i.e., term I ≠ 0, equation (1a)) and advective flows (i.e., terms V–VII ≠ 0, equation (1a)), which can be confounded and difficult to resolve. Under very stable conditions with semiopen canopies, nocturnal radiative cooling at the ground surface can cause strong atmospheric stratification below the canopy, suppressing turbulent mixing [e.g., Mahrt et al., 2000, 2001a], despite the fact that the openness of some canopies would presumably encourage greater mixing between the below-canopy environment and the atmosphere. With strong below-canopy stratification, below-canopy flows may become decoupled from the above-canopy flow, particularly with formation of below-canopy drainage flows (discussed further below). Under these conditions the vertical CO2 flux decreases rapidly with height above the source area, sometimes in the lowest few meters above the surface [Soler et al., 2002; Wilson and Meyers, 2001; Lee et al., 1995]. As a result, the magnitude of the turbulent flux may be much smaller at the above-canopy EC measurement level than the total flux [Staebler and Fitzjarrald, 2004]. When the below-canopy environment is (partially) decoupled, CO2 concentrations increase and can be advected horizontally and vented at heterogeneous locations downwind [Sun et al., 1998] or during the early morning transition periods [Grace et al., 1995; Lee and Black, 1993; Yang et al., 1999].
 With very stable conditions (smaller eddy size and weak mixing), the influence from local heterogeneities and microscale turbulent structure becomes increasingly important in quantifying the flux. Deploying sonic anemometers closer to surface can reduce the influence of vertical flux divergence, but in doing so, increases the flux loss due to pathlength averaging and instrument separation, and reduces the representativeness of the flux measurement (i.e., smaller footprint [Schmid, 1997; Foken et al., 2004]). Therefore the effects of inadequate representation of the source area can become large when the understory, litter, and soil fluxes are significantly heterogeneous in space. Increasing the EC measurement height increases the representativeness of the measurements, but may also increase uncertainties due to vertical flux divergence.
2.3. Estimation of Advection
2.3.1. Vertical Advection
 There have been several recent studies examining the effects of localized advective flows on tower-based estimates [e.g., Staebler and Fitzjarrald, 2004]. Vertical advection at night may be significant even though the mean vertical windspeed near the surface is very small (i.e., ∼0 m s−1). Vertical advection has sometimes accounted for the largest fraction of the variation in nocturnal 30-min NEE estimates [Lee, 1998], and can greatly alter annual NEE estimates [Baldocchi et al., 2000; Lee, 1998]. Finnigan , Paw U et al.  and Baldocchi et al.  further argue that if a vertical advection term is included in the conservation equation, then the horizontal advection terms should also be included. Complications further arise in some forests when advection terms during the night differ between above- and below-canopy environments. Vickers and Mahrt [2006b] have concluded that the mean vertical motion cannot be estimated from sonic anemometers and that the mean vertical motion must be estimated from mass continuity.
2.3.2. Horizontal Advection
 Respired CO2 may be advected horizontally below the EC measurement level, leading to underestimation of ecosystem respiration (Re). Horizontal advection can be partitioned into separate types of flows: (1) synoptic scale occurring over a deep atmospheric layer, often deeper than the boundary layer, (2) propagating transient mesoscale flows, and (3) localized stationary flows associated with surface heterogeneity, sometimes occupying only the lower part of the boundary layer [Lee, 1998, and references therein]. The direct effect of synoptic advection is generally small within the canopy environment, except for the passage of frontal systems. A wide variety of physical processes contribute to mesoscale motions.
 Winds over complex topography can modify the mean wind shear and generate turbulence and waves. The latter can lead to local advection and breakdown of the nocturnal boundary layer. Recently, such investigations have examined the effects of small amplitude waves at intermediate periods of 1–5 min (D. Anderson, personal communication, 2002) and stronger rotor motions at longer periods of about 20-min [Turnipseed et al., 2003] on CO2 budgets.
 The magnitude and sometimes even the sign of the advection can be questioned for all but the simplest topography. For example, air temperature along a slope does not vary monotonically and the temperature gradient can reverse in sign because of variations in soil heat flux, enhanced mixing related to changes in roughness, and increased shear-induced mixing associated with drainage flows. Increases in the vertical mixing of heat imply increased mixing of CO2. A reversal of horizontal gradients can occur at midslope where a thermal belt occurs because of the downward mixing of warmer air at midslope, in concert with colder air trapped in the valley [Yoshino, 1975; Geiger, 1961]. These spatial variations in flow regimes not only limit the utility of flux measurements from all but the most ideal tower sites but also prevent precise estimates of horizontal CO2 gradients without a detailed measurement system. Staebler and Fitzjarrald  present an approach on the basis of spatial coherence calculations to determine the spatial resolution needed to adequately estimate horizontal gradients. They investigated horizontal advection associated with surface horizontal heterogeneities at the Harvard Forest and Camp Borden AmeriFlux sites. They found that obstructions (tree stems) can explain below-canopy directional flow patterns, and also found that nighttime thermal stratification could produce drainage flows ∼92% of the time, and that buoyancy exerted more control on these drainage flows than stress divergence and pressure gradients ∼58% of the time. While these findings are not universally transferable to all sites, Staebler and Fitzjarrald  outline a methodology to determine the magnitude of controls on below-canopy advection.
 Exact contributions of horizontal advection to NEE are difficult to estimate and cannot be determined by the current EC technique (i.e., single-point, tower measurement). Instead, the current micrometeorological methods used to estimate vertical and horizontal components of advection [cf. Lee, 1998; Paw U et al., 2000; Finnigan, 1999] may be more appropriate in assessing when advection becomes significant [Lee and Hu, 2002; Gudiksen et al., 1992], determining bounds on NEE [Baldocchi et al., 2000], and estimating appropriate u* thresholds that may remove much of the data for conditions when advection or flux divergence is prevalent [Gu et al., 2005]. Nascent, site-specific estimates of advection range from 5–40% across seasonal to annual intervals (Table 2).
 Applying a classical fluid dynamic approach of estimating budgets in a control volume may be an alternative solution to the CO2 advection problem [Sun et al., 1998; Finnigan et al., 2003]. Developing this approach is difficult however, because it requires detailed estimates of storage, horizontal and vertical advection by the mean flow, and the horizontal flux divergence. Aircraft observations [Betts et al., 1990, 1992] can estimate vertical transport through the “top of the box”, provided that spatial heterogeneity is not too great. Since aircraft EC measurements are relatively expensive, they are limited to a small fraction of the desired observational period and are most meaningful when used in concert with tower measurements [Vickers and Mahrt, 2003].
 Other approaches include the use of a tracer in conjunction with independent scalar flux measurements [Martens et al., 2003], or Langrangian and analytical model efforts [Leclerc et al., 2003]. Martens et al.  used a tracer (i.e., Radon–222 which occurs naturally and is biologically inactive) to provide an independent estimate of advection and to examine the use of a u* threshold on Re, found u* corrected EC estimates agree well with radon-derived Re. Their result supports the use of a u* threshold when estimating Re from EC, and suggests drainage flows export carbon when the canopy is not well mixed. However, comparisons between advection rates derived from a tracer or traditional micrometeorological methods can be confounded by use of inappropriate parameters, such as autocorrelated turbulent diffusion coefficients or mismatched characterization of above- and below-canopy environments.
2.4. Gap Filling
 Other errors in annual estimates of NEE may arise through the filling of the unavoidable gaps in the EC data sets so that integrated fluxes may be calculated daily to annually, and in estimating daytime respiration or gross ecosystem exchange. Falge et al.  found that errors did not differ much among three methods for gap filling: mean diurnal variation, nonlinear regressions, and look-up tables based on meteorological and seasonal conditions. Each of these three methods provided a good approximation of the original integrated values when artificial gaps were created, even when large percentages of data were missing. They also reported that (1) the mean diurnal variations were best simulated when estimated for 7-d windows for nighttime data and 14-d windows for daytime data and (2) look-up tables and nonlinear regressions provided higher levels of accuracy across seasons-to-year intervals and among large temperature ranges. The authors did not however, recommend a particular method because much of the variation among methodologies was confounded by differences in data preparation and in identifying nocturnal periods with insufficient turbulence for valid EC measurements. In the simulations of Falge et al. , errors attributable to gap filling were <0.50 Mt C ha−1 y−1, even with ∼65% of the annual data missing.
 The construction and use of physiologically based models are central to the analysis of EC data [Hollinger and Richardson, 2005]. Models of varying complexity are also used to test our understanding of ecosystem processes. The recent availability of EC data has led to the increased use of data-based modeling (also called inverse modeling) where model parameters are estimated directly from the data. In order to do this correctly, and to estimate the uncertainty in the model parameters, we need to know the uncertainty (random error) in the underlying EC measurements, and have a methodology that can explicitly express the nonlinearity between EC derived fluxes and abiotic parameters.
Hollinger and Richardson  examined the characteristics of random error in EC measurements with a unique two-tower system, and found that flux data do not conform to least squares assumptions of error term variance homogeneity and normality. Using a data binning approach, they confirmed that errors in flux data are heteroscedastic and not normally distributed at a variety of flux sites ranging from crop fields to large towers [Richardson et al., 2006]. Flux data errors appear to be better represented by a double exponential probability density function (PDF) than a Gaussian PDF.
 There are potentially serious consequences from using ordinary least squares (OLS) fitting techniques with flux data, including incorrect determination of model parameter values and bias in annual NEE sums [Hollinger and Richardson, 2005; Richardson and Hollinger, 2005]. Instead of OLS methods, approaches such as maximum likelihood provide unbiased estimates of model parameters, and when coupled with, for example, Monte Carlo methods, also provide confidence intervals for parameter estimates. Several authors have applied these or related approaches (e.g., neural networks) to EC data [Papale and Valentini, 2004, 2003; Schulz et al., 2001; van Wijk and Bouten, 2002]. Richardson and Hollinger  showed that OLS fitting methods resulted in models that overestimated nocturnal respiration (and hence underestimated NEE) relative to the maximum likelihood approach.
 Kalman filter techniques are another approach for estimating model parameter values and filling gaps in a data record, and have recently been applied to EC data [Jarvis et al., 2004; Williams et al., 2005; Gove and Hollinger, 2006]. The Kalman filter is a recursive algorithm for estimating the state of a process in a way that minimizes the error [Kalman, 1960]. It has a structure that is ideal for time series where the data structure is autocorrelated, which make it an attractive candidate for estimating missing flux data. The equations for a Kalman filter consist of two parts; time update or “predictor” equations, and measurement update or “control” equations [Welch and Bishop, 2001]. The time update projects the current state (surface flux) and covariance estimates forward in time, and a measurement update adjusts the projected estimate on the basis of an actual measurement made at that time. Because the original Kalman filter was developed to estimate the state of a system on the basis of linear stochastic difference equations, and many interesting processes are nonlinear (including ecosystem atmosphere exchanges), a variety of approaches have been developed for the nonlinear problem. Williams et al.  use the ensemble Kalman filter to assimilate EC and other ecosystem data into an ecosystem C balance model [Evensen, 2003] while Gove and Hollinger  employ the unscented Kalman filter [Julier and Uhlmann, 1997] to estimate NEE model parameters and fill gaps in an EC record.
3. Measurements of Respiration Components for Diagnosing EC Data and Understanding Sources of Respired CO2
 Ecosystem respiration (Re) can be estimated with EC during the night, but may vary in temporal resolution, from 30 min to 12 hours, because longer averaging periods may be needed to assess the flux divergence, advection and storage terms (equation (1a)). Independent measurements of respiration from foliage, wood and soil have also been made with chambers, scaled to a site, and used to examine contributions of different processes to ecosystem productivity [Harmon et al., 2004; Bolstad et al., 2004; Granier et al., 2000; Law et al., 1999a; Lavigne et al., 1997; Goulden et al., 1996]. Attention is being focused on the spatial and temporal representativeness of soil chamber measurements because soil CO2 efflux accounts for ∼50 to 70% of Re [Davidson et al., 2006; Law et al., 1999a; Goulden et al., 1996] and because of large spatial variability in soil properties, litterfall, coarse woody debris, and microbial activity. In this section we discuss the uncertainty in chamber measurements over soil, which also apply toward other chamber-based measurements.
 In an attempt to estimate both the temporal and spatial variability in soil gas exchange, sampling designs often include the combination of automated chambers (closed dynamic [e.g., Lavigne et al., 1997; Norman et al., 1997; Goulden and Crill, 1997]) to assess temporal changes in soil fluxes (e.g., hourly timescale), and manual chambers (closed static, e.g., Li-6400-09, Li-Cor Inc.) measured at many locations weekly to seasonally to assess spatial variation for scaling the automated chamber measurements to the site or approximate footprint of the tower.
 Capturing the spatial heterogeneity and achieving high accuracy in soil flux measurements are not necessarily the same thing. A biased chamber design can measure the spatial heterogeneity while still being inaccurate. Accuracy in chamber estimates can only be established by measuring known fluxes under defined conditions. This has been done very infrequently [cf. Davidson et al., 2002; Butnor and Johnsen, 2004; Fang and Moncrieff, 1998]. For example, Butnor et al.  found that dynamic closed chambers underestimated fluxes through porous media such that air-filled porosity of soil alters the effective chamber volume. If systematic uncertainties are uniform among chambers, the coefficient of variation (CV = σ/μ where σ is the standard deviation and μ is the mean) can be used to determine the number of chambers needed to adequately represent the spatial variability. The CV can be used to estimate the precision in the measured (population) mean (see Figures 3a–3b). Reported values of CVs range 0.15–0.55 (Table 3). These CVs are relatively high and they place limits on the precision we can expect from a reasonable number of chambers. Furthermore, CVs may also change with seasonal climatic conditions, for example, it is possible for CVs to increase to 0.8 during spring snow melt with increased and spatially variable microbial activity (for Metolius, J. Irvine, personal communication, 2005), or also increase with the onset of wet season rains. As a general rule, increasing the number of chambers will enhance the precision in the measured mean.
Table 3. Reported Sources of Error and Coefficient of Variation (As an Indication of How Accurately Spatial Heterogeneity is Estimated) Among Different Chamber Designs and Environmentsa
Source of Uncertainty
Percent Error of Estimate
CS = closed static, i.e., syringe samples, CD = closed dynamic, e.g., automated chambers, OD = open dynamic, i.e., scrubs headspace below ambient, SL = soda lime.
Includes the potential of an altered diffusion gradient and the spatial heterogeneity on the natural environment.
Systematic leakage for CO2.
Measurements made over a uniform synthetic medium.
Chamber overpressurization by 0.5 Pa.
SL tended to underestimate high fluxes (∼4.9 μmol m−2 s−1) and overestimated low fluxes (<1.5 μmol m−2 s−1).
Remedied the large effects due to pressure differentials within open systems.
Chamber pressure < atmospheric pressure by ∼0.6 Kpa.
 Estimates from both chamber types, custom built and manufactured alike, are subject to systematic and sampling errors that can be large (Table 3). Estimating the systematic uncertainties can determine how accurate the measured mean is to a true mean from the soil surface. Sources of this type of error include the size and dynamic volume of the chamber, differences between the external and internal chamber environments such as; wind, light [Dore et al., 2003], pressure [Lund et al., 1999; Massman et al., 1997], and temperature can affect measurements, but once identified, can be remedied [Dore et al., 2003; Davidson et al., 2002; Fang and Moncrieff, 1998]. There are efforts underway to calculate soil fluxes on the basis of a dynamic soil volume to address changes in effective soil volume with moisture, particularly in porous soils (L. Xu, personal communication, 2005).
 Most importantly, automated chambers assess the temporal variability in soil respiration and a large number of spatially distributed, manually sampled dynamic closed chambers can assess the spatial variability. If all the chambers are in the representative tower source area, these two methodologies can be used in conjunction to estimate soil respiration at comparable scales to those measured by EC. Protocols for measurement, sampling strategies and how soil respiration estimates can be used in conjunction with EC are discussed by Ryan and Law .
 Deploying EC systems below forest canopies along with chamber measurements has increased our understanding of functional and physical relationships governing NEE. Below-canopy flux measurements have provided useful temporal information for understanding seasonal differences in diel patterns and turbulent structure, while chamber-based measurements can characterize the partitioning of respiration from the different sources, including soil, foliage, and bole [Law et al., 1999b]. Taken together, Law et al. [2001b] determined the vertical distribution of respiration components and provided an estimate of the below-canopy CO2 source area. The information can be used to evaluate component processes in models, particularly belowground processes that are poorly understood. Data from chamber measurements have also been used to fill gaps in EC data sets, particularly when a (large) representative portion of the ecosystem is included in the measurements [Dore et al., 2003; Bubier et al., 2002; Oechel et al., 1998; Drake et al., 1996]. However, this is extremely labor intensive when foliage and wood respiration measurements need to be sampled frequently, in addition to automated measurements of soil respiration, so it is not a feasible approach to regularly replace nighttime respiration estimates from EC measurements.
 In tall forests, the structural complexity often does not allow well-mixed conditions to prevail through the canopy profile even under high u* values [Saleska et al., 2003; Loescher et al., 2003; Miller et al., 2004], which increase the uncertainty of Re estimates. When nocturnal mixing of the canopy airspace was minimal (u* < 0.25 m s−1), some studies found chamber-based estimates of total ecosystem respiration were correlated with the change in storage in the canopy air space, which can dominate NEE estimates during calm wind conditions [Law et al., 1999a; Lavigne et al., 1997]. However, when u* ∼ 0.25–0.3 m s−1, NEE was significantly less than chamber estimates and subcanopy flux measurements [Law et al., 1999b], suggesting that under weak wind conditions, flows with low-CO2 concentration developed below—and compromised measurements from the above-canopy EC system.
4. Comparisons Between EC and Allometric Estimates of Net Ecosystem Production
 Determining net ecosystem production (NEP) by assessing the change in ecosystem carbon mass over time can provide an independent check of integrated EC measurements, and can partition NEE into the component fluxes.
Where NPPfoliage is net primary production of tree, shrub, and herb foliage and litterfall is assumed to approximate the heterotrophic decomposition of dead leaves, NPPwood is the net primary production of the bole, branches, and bark of trees and shrubs, RHwoody debris is the heterotrophic respiration from fine and coarse woody debris, and Δcoarse root and Δfine root are the changes in each of the root carbon pools over time, respectively. Total net ecosystem production (NEPtotal) can be then calculated by the addition of equations (4a)–(4c) with Δsoil C accounting for any accumulation or depletion of carbon in the mineral soil (e.g., through leaching or erosion). The NEP of foliage, wood, and root are typically determined by measuring annual growth increment of these tissues, while the change in soil C is typically assessed by the difference between carbon content at two points in time (e.g., five years apart), or by chronosequences and the use of paired plots, trading space for time.
 In an alternative mass balance approach, NEP can be approximated in ecosystems as;
where NPPtotal is the total net primary production, HRdetritus is heterotrophic respiration from all detritus, and HRsoil is heterotrophic respiration from the soil. Estimating NEP in this manner requires an estimate of annual soil respiration (typically modeled from periodic measurements) and the fraction of which is heterotrophic metabolism. Separating autotrophic and heterotrophic sources of soil respiration can, however, be difficult. In some cases, where vegetation has transitioned between C3 and C4 cover, stable isotopic analysis of respired CO2 can be used to partition rhizosphere respiration and decomposition [Rochette et al., 1999]. Alternatively, isotopic tracers can be added [Pataki et al., 2003]. In other cases, separation between autotrophic and heterotropic respiration can be made using trenched plots [cf. Epron et al., 1999] and girdling [cf. Hogberg et al., 2001]. More commonly, the heterotrophic contribution to total soil respiration is estimated by measuring the CO2 evolved from freshly excised roots [Law et al., 2001b]. There are often other “missing” fluxes such as dissolved organic carbon [Neff and Asner, 2001], herbivory [Clark et al., 2001], and harvest [Scott et al., 2004] that need to be estimated to determine if these components are significant at the measured timescales.
 While NEP derived from mass balance approaches has the potential to serve as a cross check of NEE, its utility as a validation tool is limited by both the mismatch in temporal scale between the two metrics and the various inherent measurement errors. For the most part, NEP is integrated over a one-to-five year interval rendering it insensitive to the seasonal changes in carbon flux resolved by EC. While chronosequences (typically spanning decades) are adequate to detect long-term changes in soil C, they are insensitive to assess the short-term changes (e.g., <1 year), which may be contained in NEE measurements. Alternatively, some studies assume that soil C is in steady state at interdecadal timescales, implying the allocation patterns are also at steady state at shorter timescales (e.g., interannually). The extent that these assumptions are met however, is not clear. Ironically, it is the incongruence between NEE and NEP that may best address the assumptions of short-term steady stasis in soil C.
 To properly assess EC estimates of NEE with mass balance estimates of NEP, it is important to recognize the various sources of error in NEP. We recognize three categories:
 1. Measurement error arising from instrument or human error. This error can be random or systematic but is typically small. Examples include the error in measuring tree height or shrub cover.
 2. Sampling error arising from the use of limited sample points to represent a contiguous area or time interval. This error can be large but is typically random. Examples include scaling soil C from a subsample of cores to a hectare plot or by using monthly soil respiration measurements to compute an annual flux.
 3. Parameterization error arising from the use of scaling parameters derived from other studies. These systematic errors can be large or small. Examples include the use of non-site-specific allometric biomass equations or decomposition constants.
 Generally speaking, the measurement error of values used to compute biomass such as tree diameter and tree height is low with an estimated CV of <0.02 from a structurally complex tropical forest (D.B. Clark, personal communication, 2003), inferring the CV (precision) is likely closer to 0 in structurally less complex ecosystems. This precision can also be enhanced when care is taken to standardize sampling protocols used from plot-to-plot and year-to-year (e.g., reducing human measurement error, measuring the circumference at the same location on the bole).
 Sampling error, on the other hand, can be quite large. Measuring radial bole increment on only one fifth of the trees in a hectare plot and inferring the growth of the remaining boles from diameter measurements imposed a CV of 0.20 of the stand-wide bole increment (i.e., error 2 above [Campbell et al., 2004]). Alternatively, Keller et al.  used a resampling approach (i.e., Monte Carlo) to simulate the uncertainty associated with scaling based on the measured spatial variability in DBH measurements, and found that 15 0.5 ha plots were needed for estimates to be within 20% of the measured mean (and 95% CI, i.e., error 2 above).
 Parameterization error arises largely because of the inability to estimate all necessary parameters for a complete budget. The most common is the use of non-site-specific wood decomposition constants and non-site-specific estimates of the heterotrophic contribution to soil respiration [Hanson et al., 2000]. Perhaps more problematic is the use of non-site-specific allometric equations that predict biomass from variables such as height and stem diameter. The log-log nature of most allometric equations mean that even subtle differences in these equations can lead to large systematic errors in biomass estimates [Gower et al., 1996]. Differences among published allometric equations led to a 200% range in biomass estimates for a tropical moist forest in Thailand [Clark et al., 2001] and an 83% range in biomass estimates for a conifer forests in the Pacific NW, USA [Law et al., 2006]. This approach however, assumes that each parameter estimate is normally distributed, which is often not met, and may exaggerate uncertainty in NEP estimates. For example, Keller et al.  report the uncertainty in the allometric equation used to estimate the biomass from tropical trees with DBH > 35 cm resulted in estimates within ∼20% of the mean (i.e., error 3 above).
 Despite a general awareness of the uncertainties associated with mass balance estimates of NEP, quantifying these uncertainties in a consistent manner among sites remains a challenge for most researchers. Most studies rely upon their own expert knowledge of a particular ecosystem and/or technique to assign liberal estimates of uncertainty to each component of NEP [Harmon et al., 2004; Campbell et al., 2004]. Aggregating the component errors can be done as the root sum square of the component errors or through more sophisticated stochastic resampling procedures (e.g., Monte Carlo), which can account for probability distributions for each specific component variable as well as the covariance in error among component variables [Harmon et al., 2004; Campbell et al., 2004]. Furthermore, second-order uncertainties can be estimated by comparing alternative approaches to estimating NEP such as the method defined in equations (4a)–(4d) to those defined in equation (5) [Law et al., 2001a].
 For annual integrals of NEE, uncertainties can be estimated for gap filling techniques [Falge et al., 2001, 2002], u* filtering [e.g., Miller et al., 2004], and bounds established by advection estimates. No matter what the approach, the magnitude of random uncertainty scales with the flux (compare with multisite comparison [Richardson et al., 2006]).
 Reducing known systematic errors (by, for example, minimizing pressure differentials between ambient and chamber environments, ogive analyses, and rigorous testing of u* filtering, for EC) and maintaining high-quality assurance in data acquisition (e.g., using traceable calibrations, appropriate and frequent maintenance of sensors, established and tested protocols for both allometric and EC methods) is essential in estimating NEP and NEE. If the magnitude of the summed errors is small (e.g., <50% of the total estimate), then absolute comparisons between EC and allometric estimates may be appropriate, but if they are large (e.g., >1.5x the mean estimate), comparisons are likely to be more qualitative. As a result, single year comparisons of NEE and NEP may not be reliable, but rather comparisons over multiple years may help to place bounds on expected values and increase our confidence that apparent trends in partitioned fluxes are real.
 Comparisons of NEE and NEP have been used (1) to directly assess long- and short-term climatic effects of the changes in C stores, (2) to test functional relationships used to scale carbon flux from plot-to-region [e.g.,Vourlitis et al., 2000; Oechel et al., 1998], and (3) in determining appropriate conditions to apply u* thresholds on EC measurements. For example, Curtis et al.  reported annual NEE and NEP from five temperate deciduous forests to be within the range of expected estimates, but found that it took at least five years for the cumulative estimates of NEE and NEP to converge. Biometric estimates of decomposition of coarse woody debris may have to be averaged over several years for robust comparisons of annual estimates of net uptake. Hence these comparisons are not expected to be good in systems with slow turnover rates.
Saleska et al.  and Miller et al.  estimated uncertainty in both NEE and allometric measures of C flux on the basis of a rigorous empirical determination of u* thresholds and confidence intervals, respectively (cf. from an old growth tropical humid forest). Saleska et al.  constructed a C budget that approximates NEP but lacked litterfall and respiration estimates (except for a rough estimate of coarse woody detritus respiration). Miller et al.  used only live aboveground biomass estimates which included trees with a diameter >55 cm, lianas, vines, and hemi-epiphytes. Both of these studies estimated annual NEE that were within error bounds established by the allometric estimates, but the allometric-based errors were large, ±1.6 and 2.0 t C ha−1 y−1 compared to ±0.5 and 0.4 t C ha−1 y−1 from NEE estimates (Saleska et al.  and Miller et al. , respectively). Including measured and scaled respiration budgets into these estimates would likely add many additional sources of error. This highlights problems associated with the interpretation and utility of incomplete NEP budgets for comparison purposes. However, for robust comparisons, all NEP components should be estimated (either measured or modeled), particularly belowground processes, and uncertainty estimates provided for both the NEP components and NEE estimates alike.
 A goal of all flux sites should be to identify and if possible reduce systematic and random errors, thereby increasing the accuracy and precision in NEE estimates. Some errors associated with low-frequency flux loss can be addressed through a more rigorous use of ogive analyses (integrated cospectra). At a minimum, ogives can be assessed over time spans that capture the effects of local and mesoscale motions to better determine the appropriate averaging times. However, additional flux loss may occur under specific climate conditions that would not be seen by ogive analyses. Exploration of some as yet undefined similarity functions may be able to determine specific conditions under which low-frequency transport of carbon occurs.
 Comparing NEP to NEE helps validate both estimates and enhances our understanding of carbon sequestration. Such comparisons are also useful for determining the underlying controls on productivity within a particular ecosystem. NEP estimates allow the partitioning of carbon stores and fluxes at coarse timescales (annual to multi–annual). In contrast, NEE estimates integrate over larger spatial areas and can determine environmental controls on fluxes across fine-to-coarse timescales (half-hourly to multi-annual). Even though chamber measurements have additional utility in partitioning EC NEE, the spatial and temporal variability in these measurements are a large source of uncertainty. The number of chambers needed to reduce this uncertainty (i.e., 95% confident to be within 10% of a measured mean soil flux (Figure 3a, top)) can be impractical and unrealistic for many ecosystems. New approaches are needed to reduce the measured variability in respiratory losses.
 All sites are subject to potential losses of CO2 via advection. Many ecosystems that are important to our understanding of terrestrial carbon budgets are in challenging topography. For this reason, we need to be aware of conditions that lead to advection on flat and hilly terrain alike. Improvements to current approaches are necessary to increase the precision of within-site measurements and across-site comparisons, and to increase our ability to predict regional carbon flux. Possible approaches may include some combination of designs that can better define and quantify the types of advective flows that exert the greatest influence on carbon flux [e.g., Staebler, 2003] coupled with NEP estimates to constrain the fraction of advection not accounted for meteorologically, or constraining productivity by other surrogate measures that can utilize multiple constraints derived from functional relationships of NEP or NEE. In all cases, more effort is needed to assess errors in the multiple approaches to estimating whole ecosystem fluxes and contributing processes.
 While it would be nice to suggest a uniform criteria for measurement accuracy and precision, (e.g., to provide comparisons with NEE, components A, B and C of NEP should be measured with accuracy of D while components X, Y and Z may be safely approximated) this is not possible since such guidance is inherently site specific. The sources and characteristics of measurement uncertainty are contingent on the measurement technique used and the physical and biological system under investigation. We support current calls [cf. Randerson et al., 2002; Chapin et al., 2006] for consistency in NEP estimation among sites.
 The largest challenge in reducing the uncertainties in EC measurements today, is addressing our inability to properly quantify the advection terms. We support exploring new techniques and technologies that can resolve the nighttime CO2 gradients at the relevant temporal and spatial scales to account for these otherwise missed flows.
 This research was supported by the Office of Science (BER), U.S. Department of Energy, grants DE FG02-03ER63624 and DE-FG03-01ER63278. Authors wish to thank E. Veldkamp, P. Crill, M. Ryan, M. Goulden, L. Schwendenmann, S. Miller, U. Falk, and J. Irvine for data access, P. Curtis, T. Ocheltree, W. Massman, and the anonymous reviewers for their thoughtful comments.