The relationships between terrestrial carbon dioxide flux and its primary environmental drivers are uncertain because the processes controlling CO2 cycling, especially at ecosystem scales, are not well understood. This uncertainty is compounded by the fact that the importance of controlling processes, and therefore environmental drivers, may differ across temporal scales. This paper presents and applies a geostatistical regression (GR) approach that can be used with eddy-covariance data to investigate the relationships between carbon flux and environmental variables at multiple time scales, ranging from monthly to daily. The approach uses an adaptation of the Bayes Information Criterion to identify an optimal set of environmental variables that are able to explain the observed variability in carbon flux. In addition, GR quantifies the temporal correlation in the portion of the flux signal that cannot be explained by the selected variables and directly accounts for this correlation in the analysis. This GR approach was applied to data from the University of Michigan Biological Station (UMBS) AmeriFlux site to (i) identify the dominant explanatory variables for Net Ecosystem Exchange (NEE), Gross Ecosystem Exchange (GEE), and heterotrophic and autotrophic respiration (Rh+a) at different temporal scales, (ii) evaluate whether environmental variables can be used to isolate the GEE and Rh+a signals from the NEE measurements, and (iii) determine the impact of temporal scale on the inferred relationships between environmental variables and CO2 flux. The results confirm the strong correlation between respiration and temperature and the influence of solar radiation on carbon uptake during the growing season. In addition, results highlight the influence of variables such as precipitation, vapor pressure deficit, and the fraction of photosynthetically active radiation (fPAR) in carbon cycling at UMBS. Many relationships between flux and auxiliary variables are found to be scale-dependent. Site-specific and remote-sensing leaf area index and fPAR data are not found to be interchangeable at finer temporal scales. Results also show that a linear GR model is able to capture what may initially appear to be nonlinear relationships between flux and environmental variables, because this apparent nonlinearity is found to be explained by the covariability among key auxiliary variables. Finally, results indicate that GR can be used to identify variables that partially isolate GEE and Rh+a from the NEE signal at finer temporal scales at UMBS.
 Understanding of the influence of biogeochemical processes, disturbance, and climate on terrestrial carbon dioxide (CO2) fluxes at relatively small spatial scales (e.g., plot scales) has improved significantly in recent years. Uncertainties persist, however, in large part because the processes that control sources and sinks of atmospheric CO2, particularly at the ecosystem scale, are not well understood because of complex interactions with their environmental drivers.
 Although several approaches are commonly used to approximate ecosystem CO2 flux, the eddy covariance method is unique because it can provide a more direct measurement of the flux between vegetation and the atmosphere. In this approach, the flux of CO2, i.e., the net ecosystem exchange (NEE), is approximated as the covariance of the deviations in atmospheric CO2 concentrations and vertical wind speed from their means, along with corrections for fluctuations in water vapor and temperature [e.g., Baldocchi et al., 2001]. At present, the AmeriFlux network consists of approximately 100 active sites located in various ecosystems [Hargrove et al., 2003]. Although there can be large uncertainties associated with these half-hourly measurements [Richardson et al., 2006], NEE estimates have been used to improve understanding of the temporal variability of CO2 surface flux of particular ecosystems through statistically inferred relationships at daily or longer temporal scales [Law et al., 2002].
 This paper presents and applies a new adaptation of geostatistical regression that can be used with eddy-covariance data to investigate the relationships between NEE for a forest ecosystem and environmental variables at multiple time scales.
1.1. Statistical Inference Using NEE Measurements
 One of the benefits of using high-frequency eddy-covariance data to investigate the relationship between fluxes and environmental factors is that both long- and short-term trends can be inferred from the data [e.g., Stoy et al., 2009]. Statistical approaches such as neural networks [e.g., Stoy et al., 2009] and linear regression [e.g., Law et al., 2002; Hui et al., 2003] have been used to understand the climatic controls of both the interannual and seasonal variability of carbon cycling at flux tower sites. Regression methods have the advantage of providing statistical relationships between given variables and flux. However, traditional regression approaches are limited by (1) the approach used to select the variables to include in the regression, (2) the assumption of independent and identically distributed residuals, and (3) assumptions regarding the dependent variable (i.e., how to best decompose NEE into photosynthetic uptake and respiration).
 The first of these limitations centers on the methods used to select the variables to include in the regression model, referred to henceforth as the model of the trend. Frequently, only a subset of available variables is included in the analysis (e.g., photosynthetically active radiation (PAR), soil temperature, air temperature, leaf area index (LAI), etc.) [e.g., Urbanski et al., 2007] while other potentially important data are not used (e.g., friction velocity, Normalized Difference Vegetative Index (NDVI), etc.). From this subset, every variable is typically regressed individually against flux measurements to infer relationships [Law et al., 2002; Hui et al., 2003]. Such an approach could lead to environmental variables obscuring each other's effects [Faraway, 2005]. For example, Gross Ecosystem Exchange (GEE) is a function of both air temperature and light [Blackman, 1905]. If each variable is regressed separately, the effect of air temperature could mask the effect of light, making this second variable appear not to be significant [Faraway, 2005]. This problem can be avoided if joint contributions between auxiliary variables are allowed. Although some studies have included more than one variable in regression analyses [e.g., Hibbard et al., 2005], sequential methods based on F tests for selecting the variables used in the regression do not account for the joint contributions of all possible combinations of variables.
 Second, it is likely that the CO2 flux regression residuals will be temporally correlated, especially at submonthly scales. Ignoring this correlation can lead to a misrepresentation of the relationship between an environmental variable and flux [e.g., Hoeting et al., 2006]. As such, temporal correlation must be assessed and included in both the model selection scheme and the statistical regression. Although noted as a limitation [Law et al., 2002], previous studies have not accounted for correlation in regression residuals.
 The final limitation is related to the eddy-covariance measurements themselves. Conceptually, NEE is the small difference between two large fluxes, namely photosynthetic carbon uptake via GEE and release of CO2 into the atmosphere through a combination of heterotrophic and autotrophic respiration (Rh+a). Each of these fluxes is affected differently by environmental controls. In addition, variables such as light, nutrient availability, and water stress have complex interactions with each other and with each flux component, making it difficult to ascertain the influence of a particular variable on either GEE or Rh+a. In past studies, statistical regression methods (such as simple and multiple linear regression) have been used to infer relationships between flux components (GEE or Rh+a) and either a single environmental variable or some predetermined combination of variables [e.g., Law et al., 2002; Urbanski et al., 2007]. This requires the measured NEE signal to be separated into GEE and Rh+a prior to the analysis. This is generally achieved using one of three methods: by (1) subtracting the nighttime NEE from the daytime NEE signal [Urbanski et al., 2007], (2) deriving Rh+a from a regression using nighttime fluxes at high friction velocity and an exponential transformation of soil temperatures [e.g., Law et al., 1999; Hibbard et al., 2005], or (3) modeling GEE using PAR. Some studies have shown that these methods for dividing NEE into separate parts lead to large uncertainties in the inferred Rh+a [e.g., Janssens et al., 2001], possibly biasing inferred relationships.
1.2. Study Goals
 This paper presents a geostatistical regression (GR) algorithm designed to elucidate processes controlling carbon exchange at various temporal scales at eddy covariance towers by addressing the first two limitations described above. In regard to the third limitation, the ability of the GR method to separate the auxiliary variables associated individually with carbon uptake and release is also investigated. The presented approach is applied at the AmeriFlux tower site at the University of Michigan Biological Station (UMBS) to improve understanding of carbon cycling for this mixed hardwood forest at daily, 8 day, and monthly time scales.
 The presented approach improves on the regression methods used in previous studies in two distinct ways. First, the final model of the trend only includes variables that are selected using a new adaptation of the commonly employed Bayes Information Criteria (BIC) [Schwarz, 1978], which accounts for the joint probabilities of all possible combinations of variables, and compares nonnested models (where models are not necessarily subsets of each other) unlike hypothesis tests (i.e., F test). Second, the GR method accounts for temporal correlation in the portion of the flux signal that is not explained by the environmental data sets to more accurately characterize the relationship between a set of environmental variables and flux.
 The objectives of the application to the UMBS site are to (1) identify the dominant explanatory variables for NEE, GEE, and Rh+a at different temporal scales in order to improve the understanding of carbon cycling at UMBS, (2) determine the impact of temporal scale on the inferred relationships between the environmental variables and CO2 flux, and (3) evaluate whether environmental variables can be used to isolate the GEE and Rh+a signals from the NEE measurements using a GR framework.
2. University of Michigan Biological Station Flux Tower
2.1. Site Description
 UMBS is located in the northern portion of Michigan's Lower Peninsula (45°33′35.0″N, 84°42′49.6″W). The station is home to a flux tower [Schmid et al., 2003; Curtis et al., 2002, 2005; Gough et al., 2007, 2008], part of the FLUXNET and AmeriFlux networks [Baldocchi et al., 2001], where NEE is measured at 10 Hz and averaged to reported hourly estimates. Data have been collected since 1999, together with many other environmental data sets.
 The tower is located on lake border plains in the transition zone between mixed hardwood and boreal forests [Curtis et al., 2005]. Within the tower's footprint, big-toothed and trembling aspen (Populus grandidentata, P. tremuloides) are the dominant tree species, and red oak (Quercus rubra), American beech (Fagus grandifolia), red maple (Acer rubrum), white pine (Pinus strobus), and hemlock (Tsuga canadensis) are also present. Brackenfern (Pteridium aquilium) comprise the majority of the understory vegetation [Schmid et al., 2003; Curtis et al., 2005].
 UMBS is one of the few sites where concurrent biometric and meteorological measurements have been conducted along with annual assessments of carbon storage based on accounting methods [e.g., Curtis et al., 2002; Gough et al., 2008]. These data suggest that temperature and solar radiation exert strong controls on carbon exchange [e.g., Curtis et al., 2005; Gough et al., 2007, 2008] at the site, similarly to other northern deciduous forests. It is assumed that these constraints vary seasonally and depend on leaf phenological period [Gough et al., 2008], although this has not been fully evaluated at subannual time scales. The extensive research that has been conducted at UMBS provides a unique context for interpreting the results of the GR analysis.
2.2. Study Period, Setup, and Data
 The presented analysis explores the linear relationship between NEE, GEE and Rh+a, and environmental variables (a.k.a. auxiliary variables) at daily, 8 day, and monthly time scales. The examined period spans February 2000 to December 2004.
 The study uses auxiliary variables collected at UMBS as well as data from the Moderate Resolution Imaging Spectrometer (MODIS) on the TERRA satellite [Schmid et al., 2003; Curtis et al., 2005] (http://ladsweb.nascom.nasa.gov/), listed in Table 1. Note that two sets of LAI and fraction of photosynthetically active radiation (fPAR) data are used in this study. Site-specific LAI data were derived from Vegetative Area Index (VAI) measurements using a Licor LAI 2000 Plant Canopy Analyzer and leaf litter trap estimates. In addition, the MODIS LAI data set was used [Myneni et al., 2002]. Two fPAR data sets were also collected: one from MODIS and another by transforming site-specific LAI using Beer's Law [e.g., Campbell and Norman, 1998]. Both MODIS LAI and fPAR data were provided at 8 day 1 km scale, and the pixels within a 1 km radius of the tower were averaged on the basis of the area of the pixel within this radius. All data in Table 1 were quality controlled and averaged to daily, 8 day, and monthly scales. For variables with coarser than daily resolution (e.g., MODIS data sets), data were downscaled using linear interpolation. Day and night averages of NEE were estimated using PAR values greater than zero as an indicator of daytime measurements. The auxiliary variables were categorized into groups representing different controls on surface CO2 flux as shown in Figure 1. Note that most variables have very similar seasonal cycles (Figure 2).
Table 1. Variables Considered for the GR Analysisa
Asterisks indicate the timeframe for which measurements are available: *, 1999–2004; **, 2000–2004; ***, 2003–2004.
Principal investigators: Chris Vogel, Peter Curtis and HaPe Schmid (AmeriFlux Tower).
Principal investigator: Kim Mueller (data compilation only).
Principal investigator: Mary Anne Carroll (PROPHET Tower).
Principal investigators: NASA and Oak Ridge National Laboratory.
 The hourly NEE data have many nonrandom gaps due to the lack of vertical air motion (e.g., atmospheric stability), or due to rain obscuring sensors. A gap-filled data product available for UMBS [Schmid et al., 2003] was used as the primary data stream in the presented analysis. Note that the gap filling methods used at UMBS include (1) short-term ensemble averages of hourly fluxes over the course of a day during leaf-out periods and (2) parametric models during the growing season that define the relationship between ecosystem respiration and soil temperature and gross primary ecosystem uptake to PAR [Schmid et al., 2003]. Because of the large proportion of data gaps at this site (>40%), the analysis was also repeated using non-gap-filled data for comparison to ensure that results are not reflecting assumptions used in the gap-filled model.
 The GR analysis was conducted separately on NEE, GEE, and Rh+a. To obtain the GEE and Rh+a signals, the daily averaged nighttime observations of NEE were used to represent daily ecosystem respiration (i.e., average(NEEnight) = daily Rh+a), similar to the approach given by Urbanski et al. . Daily GEE was then derived by subtracting the averaged NEE night measurements from the daily NEE average (i.e., GEE = average(NEE) − average(NEEnight)). Although this approach may underpredict GEE because daytime temperatures are higher than nighttime temperatures, alternative methods for separating GEE and Rh+a are based on assumed relationships between these flux components and auxiliary variables such as temperature. Such parametric relationships would have potentially biased our results where the selected variables may have solely mirrored the prescribed relationships used to separate fluxes. For the non-gap-filled NEE analysis, if more than 50% of the nighttime NEE measurements were missing, then NEE, Rh+a, and GEE were not considered separately.
 Random gaps occurring in the environmental data sets within a day were not filled unless they were large (>50% of missing data for a given day). For such large data gaps, all data were excluded from the analysis. The averaged monthly, 8 day, and daily variables were evaluated against coincident gap-filled data sets available from the Carbon Dioxide Information Analysis Center (ftp://cdiac.esd.ornl.gov/pub/AmeriFlux/data/), and no substantial differences were observed.
 The GR approach presented here includes both a model selection step and a geostatistical regression. This approach is particularly useful when temporal correlation is present within the regression residuals. If correlation is present but not modeled, statistical inference techniques can lead to Type II errors (i.e., incorrectly concluding that a significant relationship exists) [Faraway, 2005].
3.1. Geostatistical Model
 As with multiple linear regression (MLR), geostatistical regression expresses the dependent variable (in this case, NEE, GEE, and Rh+a measurements), z, as the sum of a deterministic component (μ) and a stochastic term, (), representing the residuals between the observations and the deterministic component. However, instead of assuming that these regression residuals are independent (i.e., “white noise”), , is modeled as a vector of correlated zero-mean residuals. The deterministic component represents the portion of the observed flux variability that can be explained using a set of covariates (a.k.a. auxiliary variables) [Huang and Chen, 2007], while the stochastic component describes the variability in z that is not explained by the deterministic component:
The deterministic component takes the form of a trend, or expected drift (i.e., μ = Xβ). This model can be as simple as a single overall mean or can include any linear combination of variables related to z. The X matrix contains vectors of k covariates that are scaled by the vector of unknown drift coefficients (β). Even though the individual columns in X are linearly related to z, the columns themselves can contain transformations of one or more auxiliary variables, e.g., exp(temperature) or lagged data. GR (section 3.4) is used to obtain the best estimates of the drift coefficients, , which represent the relationship between CO2 flux and each covariate, and their corresponding uncertainties, .
 The covariance of the regression residuals, , is modeled as
where hi,j is the time lag between times ti and tj, Q(hi,j) is the covariance for residuals with a lag hi,j, and E[ ] denotes the expectation operator. Equation (2) assumes that the flux residuals are homoscedastic, although a model where the variance changes seasonally could be implemented if needed.
 Many covariance functions can be used to model the behavior of the residuals in equation (2) [e.g., Cressie, 1993], but a combination of a nugget and exponential covariance function was found here to aptly model the temporal covariance of NEE, GEE, and Rh+a observations and residuals:
where the practical temporal range of correlation is approximately 3τQ beyond which σn,Q2 + σs,Q2 represents the variance of independent flux residuals. The nugget, σn,Q2, is the sum of (i) the variance of the portion of the flux variability that is not temporally correlated or explained by the model of the trend, and (ii) the measurement error of the observations at the daily, 8 day or monthly temporal scale. These parameters are estimated using Restricted Maximum Likelihood (section 3.3).
3.2. Model Selection
 One of the key components of the presented approach is an objective method for selecting the auxiliary variables to be included in the model of the trend (X). As noted by Burnham and Anderson , identifying the structure of the deterministic component is conceptually more difficult than estimating the drift coefficients and associated uncertainties. Traditionally, variables used in regression analyses have been selected on the basis of mechanistic studies or expert knowledge of carbon cycling within a particular ecosystem, but the larger challenge lies in choosing the appropriate dimensionality of a model (i.e., the number of covariates) given the information content of a finite set of flux measurements [Schwarz, 1978]. On one hand, as more variables are added to the model of the trend, the deterministic component is better able to capture the variability in the observations. However, although the fit of the model to the data will invariably improve with additional parameters, some of these may serve only to reproduce spurious correlations [Faraway, 2005], thereby confounding the analysis. Therefore, the aim is to balance the amount of variability explained by adding variables to the trend with the loss of the degrees of freedom inherent to a more complex model.
 One of the most widely used model selection techniques is BIC [Schwarz, 1978] because, unlike hypothesis test based methods, it is able to evaluate nonnested competing models [Ward, 2008]. This method does not use the traditional hypothesis testing paradigm and, therefore, cannot be used to make conclusions regarding the statistical significance of the difference between two models. Instead, BIC ranks how well the data supports each model, taking into account both the goodness of fit, i.e., sum of the squared residuals, and the number of covariates in each candidate model. BIC is generally favored over other information criteria methods when explanation and inference (not solely prediction) are of principle interest [Ward, 2008].
 BIC is loosely based on the idea that candidate models should be compared using their posterior probabilities [Schwarz, 1978]. The BIC criterion [Schwarz, 1978] of a particular model, Xj, of kj covariates and n measurements is given by
Assuming that the regression residuals follow a Gaussian distribution, the likelihood, , of a particular model is given by
where n is the number of NEE, GEE, or Rh+a measurements, β are the unknown drift coefficients, and Q is given by equation (3).
 As seen in the works of Kitanidis  and Hoeting et al. , the term in the exponent can be modified to remove any bias associated with the unknown drift coefficients, β, by setting β = (XTQ−1X)−1XTQ−1z. After taking the natural log, removing the constant term, replacing β, rearranging terms, and combining with equation (5), the newly adapted BIC equation that can account for correlated residuals becomes
 For the special case of independent residuals, Q = σ2I, where I is an identity matrix, and equation (6) reduces to the more conventional form, where RSS is the residual sum of squares:
 In this study, most of the variables are highly correlated (approximately half of the 27 variables considered for the trend had a pair that yielded a correlation coefficient greater than 0.75). This is not surprising given that many of the data sets represent similar quantities, such as temperature, radiation, and vegetation. Therefore, these similar data sets are grouped into categories as presented in Figure 1, complementing the BIC with scientific understanding regarding their relationship to flux. The BIC is then run by restricting the number of variables from each category to at most one, to avoid problems with excessive collinearity among the auxiliary variables, which could lead to large and opposing regression coefficients that do not reflect expected relationships to flux, and that have overly wide associated uncertainty bounds [Faraway, 2005]. Note that fully automated model building procedures are not recommended as a means for identifying the best interpretable model, because such procedures can potentially select models that represent only spurious relationships, and therefore can fail when applied to comparable data sets [Judd and McClelland, 1989]. A condition number is used to diagnose collinearity [Faraway, 2005]. Finally, because correlation coefficients for variables in their trend provide a measure of the relationship among themselves and not the relative independence of the relationship of a variable to flux, the correlation coefficients of the drift coefficients, , for the selected variables are also estimated and compared.
3.3. Restricted Maximum Likelihood
 Along with the trend (Xβ), the temporal covariance matrix (Q) of the regression residuals plays a critical role in the geostatistical model. The Restricted Maximum Likelihood (RML) approach [e.g., Kitanidis and Shen, 1996] is used to quantify the covariance parameters (σn,Q2,σs,Q2,τQ) by minimizing the negative logarithm of the likelihood of the available data with respect to these parameters, yielding the following objective function:
which is minimized with respect to the covariance parameters, and where indicates the matrix determinant.
3.4. Geostatistical Regression
 Estimates of the drift coefficients, , and their uncertainty covariance () [e.g., Cressie, 1993] are calculated as
where all variables are as previously defined, and the diagonal elements of are the variances representing the uncertainty of the drift coefficients. The coefficient of determination R2 is calculated as
where represents the mean of the observations, to quantify the portion of the flux variability that is explained by the model of the trend.
 The GR was performed for a 5 year time period, as well as for three distinct seasons (growing season, spring green-up, and nongrowing season). The goal was to identify the dominant variables that explain the variability in NEE, GEE, and Rh+a at different temporal scales. The study also investigated the feasibility of using auxiliary data to statistically separate flux components (i.e., GEE and Rh+a) in NEE measurements. Finally, this work explored the sensitivity of results to the use of remote-sensing versus site-based LAI and fPAR data and to the assumption of linearity between the auxiliary variables and flux observations.
 Several sensitivity tests were performed to ensure that (1) the model selection was not unduly influenced by data from a particular year, (2) using gap-filled data did not affect results, and (3) the regression residuals were symmetric and close to Gaussian. Excluding individual years from the analysis negligibly impacts the presented results, as did the substitution of non-gap-filled data in the analysis. More importantly, the results using gap-filled data do not mirror the assumed relationships used in the NEE gap-filling algorithm (in particular, soil temperature was never selected as an important variable for respiration) (section 2.2). Finally, regression residuals are symmetric. As such, results of these sensitivity tests are not shown for brevity. However, the outcome from these tests provides evidence for the statistical validity of the results presented in the following sections.
4.1. Explanatory Variables in the Monthly, 8 Day, and Daily NEE, GEE, and Rh+a Trends
 Auxiliary variables were selected using the BIC algorithm outlined in section 3.2 for regression models for NEE, GEE, and Rh+a. Drift coefficients and associated uncertainties were estimated for the resulting nine models (three dependent variables × three temporal scales) using equations (9) and (10) (Table 2). The correlation coefficients of the drift coefficients () for all models are less than 0.7 (unless noted otherwise in the text) with condition numbers less than 30, indicating that the BIC method, complemented with the grouping of variables, is able to avoid problems with excessive collinearity. Note that a positive sign on the estimated drift coefficients indicates a positive correlation with CO2 flux (i.e., a source or a reduction in sink), while a negative sign indicates a negative correlation (i.e., a sink or reduction in source).
Table 2. Selected Variables and Associated Drift Coefficients () as Estimated From the GR Algorithm at the Monthly, 8 Day, and Daily Temporal Scalesa
Units are μmolCO2/m2s. All drift coefficients as estimated by GR are significant at the 95% confidence level unless italicized in the table. Numbers in brackets indicate the reduction in explanatory power when the associated variables are removed from the model [ΔRi2] where a higher number signifies a more dominant variable. The variance explained (R2) and condition number of the trends are provided for each model. Note that other variables in Table 1 were never selected and are therefore not listed here. Dashes indicate categories of variables that were not selected for a particular model.
−0.14 [∼ 0]
Nighttime air temperature
Vapor pressure deficit
0.25 [∼ 0]
Soil heat flux
fPAR × accum. PAR
Condition number (K)
 All selected models of the trends explain over 75% of the variability in measured fluxes (0.77 ≤ R2 ≤ 0.98, Table 2). Note that the high R2 values are not solely reflecting the predictability of the seasonal cycle, because using a trend derived for a coarser timescale (e.g., monthly) to explain variability at a finer time scale (e.g., 8 day or daily) yielded a substantially lower R2 relative to the case where the time-scale-specific trend was used. For example, the monthly trend explains only 61% of the variability in the daily measurements compared to 77% explained by the daily trend.
 Overall, the variance explained is particularly high for Rh+a, which suggests that respiration can be more easily represented than photosynthetic uptake. Photosynthetic uptake is also captured well, except for periods with exceptionally strong uptake, such as in July 2003 (Figure 3) and for the 8 day and daily cases in July 2001 (not shown), indicating that key variables needed to explain this anomalous uptake may be missing, or that nonlinear effects become important in these cases.
 Vegetation (as represented by the sum of the LAI and fPAR contributions) has the strongest correlation to seasonal carbon cycling at UMBS across all temporal scales. This finding is expected given that the morphological, physical, and chemical properties of vegetation have been shown to substantially affect processes of carbon and nutrient cycling in deciduous forests [e.g., Dorrepaal, 2007]. This result is also expected because the UMBS forest is an overall net sink of CO2, such that variables associated with carbon uptake are expected to be important in explaining the overall signal. At the finer (daily and 8 day) time scales, on the other hand, the influence of the amount of PAR intercepted and/or absorbed by the canopy (APAR = fPAR × Daily Accumulated PAR) also becomes significant in explaining carbon uptake, as represented by both the NEE and GEE measurements (Table 2). As noted by Anderson et al. , many other studies have demonstrated the linear relationship between the increase in canopy biomass and the amount of visible light intercepted or absorbed in the canopy [e.g., Monteith, 1966]. However, as expected, these results indicate that Light Use Efficiency (LUE) plays a more important role at synoptic scales, whereas vegetation better explains seasonal carbon cycling.
 In addition to examining the regression coefficients associated with individual variables and their associated uncertainties, the explanatory role of selected variables was further examined by successively eliminating each variable from the trend and quantifying the resulting reduction in R2 (Table 2), or ΔRi2. Variables that result in a larger reduction in R2 explain more variability in the flux measurements. For example, when LAI was excluded from the model of the trend for NEE at the monthly scale, and regression coefficients were recalculated for the remaining variables, the ΔRi2 associated with LAI was 0.27. Removing fPAR at this scale had much less of an impact (ΔRi2 is 0.09). Thus, LAI is a more important variable at this scale, and fPAR appears to be adjusting LAI to help fit the NEE measurements. The magnitude and sign on the regression coefficients for these variables further confirm this result, because the drift coefficient of LAI is negative, corresponding to a sink of CO2, and explaining the main seasonality of carbon uptake.
 Conversely to NEE, the variables that best explain respiration, and their significance, are relatively scale independent (Table 2). In terms of carbon sources at UMBS, Curtis et al.  noted that losses from soils account for approximately 70% of the carbon respired between 1999 and 2003. These losses include both root respiration and microbial respiration, which are, in turn, influenced by factors including photosynthetic supply to roots, substrate quality and availability, temperature, and moisture [Hibbard et al., 2005]. In addition, Curtis et al.  noted only small interannual variation (<6%) in soil respiration at UMBS, suggesting that there is little variation in these primary controls from year to year. This finding at UMBS, coupled with results presented herein, suggests that the respiration signal is more uniform both spatially and temporally than previously understood [e.g., Hanson et al., 2000; Hibbard et al., 2005] for mixed northern hardwood forests.
 The specific variables selected for the Rh+a model of the trend (including nighttime air temperature, vapor pressure deficit (VPD), and site-specific fPAR) are different from those identified as important controls in previous work (including soil temperature and moisture, substrate availability and quality, soil carbon decomposition and microbial growth dynamics, and soil hydraulic properties) [e.g., Davidson et al., 2002; Reichstein et al., 2005]. Although many of these variables were either not available or provided at scales that rendered them unsuitable for this analysis, the exclusion of soil temperature and moisture from the Rh+a model of the trend is unexpected. These results may reflect the fact that the soil moisture data were collected at 1 m depth, which tends to be less temporally variable than soil moisture closer to the surface. Given that the soils at this site are well-drained spodosols (92% sand, 7% silt, and 1% clay) [Gough et al., 2008] with a shallow O horizon, a shallower soil moisture data set might reflect moisture dynamics in the root zone. Unfortunately, these data were also not available for the study. In addition, nighttime air temperature (or air temperature) may be more representative of the actual temperature influencing heterotrophic respiration than soil temperature (which is measured at a depth of 7.5 cm).
 The significance of VPD in the respiration model may indicate that this variable acts as a proxy for the moisture available in the canopy, where larger values indicate drier conditions that physiologically impede carbon efflux. The effects of water stress on plant respiration often are mediated through loss of tissue turgor and stomatal closure [Aber et al., 1991], which can result in substantial reductions in respiration per plant [Davidson et al., 2006].
 The significance of fPAR in the Rh+a model of the trend (Table 2) is more difficult to interpret: fPAR is likely acting as a proxy for another variable that was not included in the analysis. For example, fPAR might be representing the amount of substrate available for heterotrophic respiration. Other studies have found that using LAI (which is closely related to fPAR) as a surrogate for site productivity across a range of temperate forests could help explain differences in annual respiration, hypothesizing that the larger the site LAI, the more substrate is produced for respiration [Reichstein et al., 2003]. Otherwise, as discussed in section 4.3, site-specific fPAR may simply act as a better proxy for overall seasonality than other available variables, because it is a temporally smoother data set. Note that removal of fPAR from the model resulted in a smaller ΔRi2 relative to the removal of nighttime temperature, indicating that temperature explains more of the respiration variability.
 In addition to reflecting the general findings noted previously, the daily scale analysis yielded some unexpected results for all examined dependent variables. For example, precipitation was associated with a source or a decrease in sink at the daily scale in the NEE and GEE trend models, but was not significant for the Rh+a model where it might be associated with soil moisture. While this result may seem counterintuitive for this ecosystem type, precipitation may in fact be acting as a proxy for periods with significant cloud cover, and therefore for times with reduced sunlight for photosynthesis. This would have a larger impact at synoptic scales, whereas this effect may be averaged out at 8 day or monthly time resolutions. Note that precipitation may have a lagged effect on carbon uptake by affecting water availability on different time scales, which could be investigated using a shallower soil moisture data set or by adding a lagged precipitation variable to the superset of variables considered for model selection. Variables such as friction velocity may also be helping the model of the trend capture some of the small-scale flux variability that cannot be represented by the other variables that were collected at larger time scales, rather than informing mechanistic understanding. In all cases, these variables were associated with a smaller ΔRi2 relative to LAI, fPAR, and APAR, making conclusive attribution of their impacts more difficult.
 Note that at 8 day and daily time scales, accounting for correlation among the residuals using GR yields different models of the trend, regression coefficients, and uncertainties for NEE and GEE relative to a setup where such correlation is ignored (analogous to MLR). When temporal correlation was ignored, at least 2 or 3 additional variables were selected for the trend, because the underlying temporal correlation was misattributed to one or more of the candidate variables. In addition, the significance of the regression coefficients was reduced when MLR was applied. These results further emphasize the need to account for the covariance of residuals in regression analysis of flux data at submonthly resolutions.
4.2. Isolating Photosynthesis and Respiration From NEE Measurements
 The auxiliary variables selected for the NEE model of the trend can be used to partially isolate carbon uptake and release at submonthly temporal scales. At the monthly scale, none of the variables identified as being important for Rh+a were selected for the NEE model, indicating that Rh+a cannot be derived from the NEE observations using auxiliary variables at this scale. This is likely due to the fact that the seasonality at UMBS dominates the monthly signal, which is primarily controlled by the seasonal cycle of photosynthetic activity at this site. At the 8 day and daily time scales, however, results are more promising (Table 2), with air temperature (a variable similar to nighttime temperature important for Rh+a) also being selected for NEE. Overall, the covariates that are associated with carbon uptake and release in the NEE model explain 90% of the GEE variability and 94% of the Rh+a variability at the 8 day scale (Figure 3) and 83% and 86% at the daily scale (Figure S1), respectively. This result indicates that NEE measurements at fine time scales can be used to identify variables that are important for photosynthesis and respiration separately. This suggests that selected auxiliary variables can potentially be used to separate NEE observations and/or geostatistical inverse modeling total CO2 flux estimates [e.g., Michalak et al., 2004; Gourdji et al., 2008] into component fluxes.
4.3. Regression Analysis for Growing Season, Spring Green-Out, and Nongrowing Season
 The forest at UMBS is a net carbon source from early fall (late September) until late spring (mid-May) [Gough et al., 2008], and this strong seasonality may be associated with changes in the significant auxiliary variables and/or their relationship to flux for the current analysis. To investigate this question, the daily GR analysis was repeated for (1) the growing season, approximately day-of-year (DOY) 140–276, a period of increasing leaf density defined by the period for which soil temperature is above 5°C [Schmid et al., 2003]; (2) spring green-up in May, a period of rapid leaf growth coinciding with dramatic shifts in atmospheric humidity, surface energy balance, and the balance between respiration and photosynthesis; and (3) the nongrowing season, approximately DOY 295–117, a period of leaf senescence and limited growth due to lack of sunlight and cold temperatures, with an average air temperature below −1°C.
 The amount of available sunlight is found to drive photosynthesis during the growing season, consistent with current understanding [e.g., Gough et al., 2007] (Table S1). Net radiation, the daily variation of which is similar to that of PAR during this time of the year [Oliphant et al., 2006], explains the majority of the variability in NEE and GEE, with some adjustments provided by vegetative indices (i.e., site-specific fPAR and site-specific LAI in the NEE and GEE trends, respectively). The selection of vegetation indices is reasonable, because seasonal changes in leaf area strongly affect the light environment of forest canopies, especially those dominated by aspen [Roden, 2003]. However, it is unclear whether site-specific LAI or site-specific fPAR is most strongly associated with carbon uptake during this time period, because the fPAR data set was derived from the LAI data, as described in section 2.2. The other variables play a more minor role, but are generally consistent with those from the analysis presented in section 4.1. The only notable exception is the absence of site-specific fPAR in the Rh+a model of the trend, which suggests that temperature controls are more dominant on respiration during this time period.
 In May, on the other hand, the rapid change brought about by leaf-out in the spring results in the largest changes in both selected variables and estimated variables from the overall seasonal relationships presented in section 4.1 (Table S1). However, the amount of PAR absorbed or lost within the canopy remains the dominant explanatory variable of carbon uptake during this time period (i.e., ΔRi2 is largest when fPAR × Accumulated PAR was removed from the NEE and GEE May trends, among variables with an associated negative regression coefficient). As with the growing season, air temperature (or nighttime temperature) captures the majority of the respiration signal in both the NEE and Rh+a models of the trend.
 Only NEE and Rh+a were evaluated for the nongrowing season, because there is little growth during this period. Nighttime air temperature remains the dominant variable in the Rh+a model, and also becomes an important variable for NEE, providing further evidence that temperature controls carbon efflux for this forest ecosystem. The other dominant variable, fPAR, appears to help the model of the trend better fit the seasonality of the respiration signal and is therefore likely not directly acting as a proxy of some mechanism controlling respiration (Figure S2).
 Note that the regression residuals from the seasonal analysis are homoscedastic, whereas those in the full year analysis showed some differences in variance with seasons. Given that the results of the seasonal analyses are generally consistent with those presented in section 4.1 for the full year, heteroscadicity of these residuals does not appear to play an important role in the year-round analysis.
4.4. Sensitivity Analysis of LAI and fPAR
 A sensitivity analysis was performed to assess the impact of using site-specific versus remote-sensing-derived LAI and fPAR on the results presented in section 4.1. This analysis is particularly important given the significant roles that LAI and fPAR play in the models of the trend at all temporal resolutions. In addition, the fact that site-specific LAI and fPAR are selected over the remote-sensing data products at smaller temporal scales (and at all scales for Rh+a) raises questions about the use of satellite data products for eddy-covariance studies. Figure 4 shows that the MODIS LAI appears to overestimate site specific LAI during the growing seasons, while the fPAR measurements are relatively consistent, although the onset and subsidence of the growing season differ. In addition, the MODIS LAI and fPAR data sets are inherently noisy, especially in the nongrowing season when there is little vegetation activity at UMBS. In the sensitivity analysis, the “preferred” LAI and fPAR data sets (defined as the LAI and fPAR selected for those models in Table 2), were removed from the analysis, the BIC was rerun, and the impact on the selected variables and their relationship to NEE, GEE and Rh+a was reevaluated (Table S1).
 Although all models of the trends have slightly less explanatory power without the “preferred” LAI and fPAR data sets, MODIS and site specific LAI and fPAR explain similar seasonality in the monthly NEE and GEE measurements. As the temporal scale decreases, however, MODIS EVI becomes a better substitute for site-specific LAI (“preferred” data set) and site-specific fPAR (also a “preferred” data set) over MODIS LAI and fPAR. Since EVI is also inherently noisy, this substitution may be due to the difference in the resolution of these MODIS products, where LAI and fPAR were provided at the 1 km scale, whereas the EVI data was available at a 250 m resolution. These results suggest that the representativeness of the MODIS 1 km products of a flux tower site may be adequate at monthly scales, but less so at finer temporal resolutions. In addition, these results indicate that site-based estimates of LAI and fPAR based on relatively few measurements appear to be able to adequately represent properties of large areas (∼1 km2) at flux tower sites for the purposes of studying carbon cycling, which would be contrary to suggestions cited in previous work [e.g., Beerling and Quick, 1995].
 For Rh+a, when site-specific fPAR was removed from the analysis, site-specific LAI was always selected as a substitute, with minor changes to the overall fit of the model, indicating that the process characterized by fPAR is best represented by site-specific vegetation variables at all temporal scales.
 For all models (NEE, GEE, and Rh+a), most other variables remain consistent with those selected in section 4.1, suggesting that the relationship between these parameters and flux is relatively independent of the representativeness of the LAI and fPAR data sets. The fact that their associated inferred drift coefficients are similar both in signs and magnitudes further supports this finding, supporting the robustness of the results presented in section 4.1.
4.5. Testing the Linearity Assumption
 The GR models built in this work assume a linear relationship between NEE, GEE, or Rh+a, and the selected auxiliary variables. This assumption was tested by examining scatterplots of flux as a function of individual selected variables for the monthly, 8 day, and daily analyses. An example of 8 day averaged GEE and Rh+a plotted against a subset of auxiliary variables is presented in Figure 5 (top). These scatterplots reveal possible nonlinear relationships. However, these nonlinear relationships either vanish or are significantly reduced in the residuals from the GR models (Figure 5, bottom). This result indicates that relationships that appear to be nonlinear when fluxes are regressed against individual variables can in fact be explained using linear relationships when multiple auxiliary variables are considered, because of the covariability among the key auxiliary variables. The analysis presented in Figure 5 also supports the use of a linear model for the analyses presented in this work and further cautions against using regressions against single variables to infer their relationships with NEE, GEE, or Rh+a, because such single-variable regressions may lead to incorrect conclusions about the nonlinearity of the relationship between environmental variables and flux.
5. Discussion and Lessons Learned
5.1. Are Results Consistent With Existing Understanding of the Controlling Factors of Photosynthesis and Respiration at UMBS? Do They Provide New Insight Into Carbon Cycling at This Site?
 In general, the results of this study are consistent with current understanding of carbon cycling for this forest ecosystem, including the strong correlation between respiration and temperature and the influence of solar radiation on carbon uptake during the growing season [Gough et al., 2008].
 However, this study has also identified additional variables to the expected ones mentioned above that explain variability in GEE and Rh+a. First, fPAR appears to act as a proxy for other important variables that were not considered in this analysis to capture the overall seasonality in Rh+a, such as potentially the amount of litter substrate available for heterotrophic respiration, or the amount of substrate available for root respiration. This finding is consistent with the fact that the UMBS soil is nutrient-poor, making substrate availability important in terms of respiration [Gough et al., 2008]. Second, light and LAI are important for explaining, and therefore potentially controlling, sink processes at UMBS. APAR is more important at fine temporal scales, although LAI and fPAR remain the most important auxiliary variables. This suggests that, despite the complexity of this ecosystem, CO2 uptake is regulated mostly by vegetation response to large-scale energy input [Albertson et al., 2001] and, therefore, can be represented using simple linear relationships to a few key environmental variables. Third, the variance explained for the Rh+a models is higher than those for NEE and GEE for all examined cases. This is an unexpected result given the current relative lack of understanding of processes controlling respiration, and implies that unexplained variability in GEE may contribute to large uncertainties in annually averaged predicted uptake.
 Finally, site-specific and remote-sensing LAI and fPAR data do not appear to be interchangeable, especially at finer temporal scales. This is likely due to either the poor spatial representativeness of coarser remote-sensing data products relative to site-specific data, or to noise within these MODIS data sets. These results are important because the choice of data, especially for vegetative indices, in empirically based models is the subject of much debate, with some studies electing to use EVI [e.g., Sims et al., 2006] and others preferring LAI [e.g., Lindroth et al., 2008], fPAR [e.g., Running et al., 2004], and/or Land Surface Water Index (LSWI) [Mahadevan et al., 2008], all generally provided or derived from MODIS products, as proxies for gross productivity. This work suggests that the spatial representativeness of data at relevant spatial and temporal scales may be as important as the choice of the specific vegetative indices to be used.
5.2. To What Extent can NEE be Used to Understand Processes Controlling Photosynthesis and Respiration at UMBS? Are Results Applicable at Other Sites?
 If the variables selected for the NEE model of the trend were consistent with those selected for GEE and Rh+a, then the NEE signal could be used directly to understand the processes controlling component fluxes. Conversely, if there were no commonality between variables selected for NEE, GEE, and Rh+a, this would indicate that NEE contains little direct information about component fluxes. Because sink activity dominates at UMBS, the monthly NEE model of the trend contains similar variables to those selected for GEE, and does not include variables that capture Rh+a. However, at smaller resolutions (8 day and daily), results suggest that auxiliary variables may provide an alternative means of separating photosynthesis and respiration from the NEE measurements.
 The relative amount of sink and source activity should be considered in assessing the ability of the GR algorithm to partition the NEE signal using auxiliary variables. Given that the relationship between respiration and primary production would be different at every flux site, it is difficult to determine whether estimated NEE models would yield similar results at other sites. However, it seems reasonable that at longer scales, the dominant activity (i.e., sink or source) would be better represented by the selected variables within a model.
5.3. To What Extent Does GR Provide Insight Into Factors That Influence Carbon Cycling?
 One advantage of statistical approaches for studying carbon cycling is that they can identify key relationships among available observations and environmental data sets, with relatively little reliance on assumptions about controlling processes. This can lead to the identification of important variables that would otherwise be overlooked. Statistical approaches can also be seamlessly applied across temporal scales, thereby providing a method for evaluating the validity of mechanistically derived relationships at different temporal resolutions.
 This study suggests that simple linear regression methods, as used in previous flux studies, may yield a statistical relationship that is simply a reflection of the correlation between the seasonal cycle of flux and an individual auxiliary variable, rather than representing a true explanatory relationship. Such a correlation could eclipse the true relationship between an auxiliary variable and flux. This work confirms results from other studies [e.g., Stoy et al., 2009] that concluded that the statistical relationship between an auxiliary variable and flux can be scale-dependent, as well as seasonally varying. These results emphasize the need to explicitly interpret statistical models only at the scales at which relationships were derived. Similarly, studies that use biospheric models that “scale up” or “scale down” relationships inferred at one resolution to another resolution should verify whether the processes and parameterizations used by the model are scale-dependent.
 This study also demonstrates the need to account for temporal correlation in residuals in statistical regressions, especially for analyses at fine temporal scales (i.e., submonthly), where residuals have correlation lengths that span multiple time periods (e.g., the τQ for the NEE and GEE 8 day residuals were 6 and 6.5 8 day periods, respectively, compared to a τQ for Rh+a residuals of 1.25). Correlation is quantified using the Restricted Maximum Likelihood (RML) method, further limiting model assumptions that could otherwise bias estimates.
 This work relied on an assumption of linearity between the examined auxiliary variables and flux. Although this is contrary to the functional forms of the relationships between these variables and flux as implemented in many biospheric models (which are applicable at a physiological level), results presented in section 4.4 show that a linear model is able to reproduce what initially appear to be nonlinear dependency when variables are examined individually. This confirms both that the linearity assumption is justified for this analysis, and further emphasizes the potential for inferring erroneous relationships between variables and flux when examining variables individually.
 Finally, although statistical methods can be powerful tools for studying complex systems such as carbon cycling, these analyses do not in and of themselves prove causality. Instead, they highlight dominant relationships and patterns that can, when reaffirmed with additional results and scientific knowledge, illuminate process-based understanding. One limitation of the approach presented in this work is that the BIC method used here selects a single “best” model of the trend. In some cases, multiple similar sets of auxiliary variables provided comparable fits to the available observations, although the dominant variables remained consistent. Approaches for accounting for the uncertainty associated with the form of the model of the trend are explored in related work [Yadav et al., 2010].
 Note that although the focus of this work is on statistical inference, a further analysis was performed to see how well the GR method could perform in predicting daily NEE for a given year, relative to multiple linear regression. In this analysis, data from 1 year were removed from the analysis, and the remainder of the data were used to (1) select variables, (2) estimate regression coefficients, and (3) predict fluxes for the missing year. The results show an improvement in prediction (as evaluated using R2) over multiple linear regression. The majority of the improvement is attributed to the more representative set of variables that are selected by the BIC method when temporal correlation of the residuals is considered in the GR approach.
 This paper presents a GR approach for studying the complex biosphere-atmosphere exchange of CO2 at eddy-covariance measurement sites, and applies the proposed approach to the AmeriFlux site at UMBS. The GR approach is shown to be a useful method for exploring the relationships between auxiliary variables and NEE, GEE, and Rh+a at this flux tower site across temporal scales.
 Overall, conclusions about carbon cycling from this study at UMBS are consistent with current understanding, including the strong correlation between respiration and temperature, and the influence of solar radiation on carbon uptake during the growing season. However, the study also highlights the influence of other variables such as precipitation, VPD, and fPAR on both carbon uptake and release across multiple temporal scales. Results also confirm that many relationships between flux and auxiliary variables are scale-dependent. In addition, the study showed that site-specific and remote-sensing LAI and fPAR data are not interchangeable at finer temporal scales, indicating that the choice of the specific vegetative indices used in an analysis is as important as their spatial and temporal representation. Finally, results show that a linear GR model is able to capture what initially appear to be nonlinear relationships, due to the covariability among individual auxiliary variables in the model.
 In addition, GR is found to be able to identify variables that partially isolate GEE and Rh+a from the NEE signal at smaller time scales. Therefore, GR can be used to infer process-based information from observations of NEE using other available data sets, without having to separate the signal into component fluxes, thereby avoiding a possible source of error. This result also suggests that a similar approach may be useful for geostatistical inversion studies [e.g., Michalak et al., 2004; Gourdji et al., 2008] that use atmospheric measurements along with auxiliary data and an atmospheric transport model to infer CO2 surface fluxes, because the auxiliary data used in such studies may help to isolate the photosynthetic and respiration signals.
 The GR model as presented herein could be extended to account for the uncertainty of selecting a single “best” model of the trend when multiple sets of auxiliary variables provide comparable fits to the available observations [Yadav et al., 2010]. In addition, instead of separating the data by seasons, heteroscedasticity in the residuals could be modeled using a more complex temporal covariance matrix, Q. Finally, nonlinear or lagged relationships could also be included to further improve the fit of the model.
 This work was supported by the National Aeronautics and Space Administration (NASA) under grant NNX06AE84G, “Constraining North American Fluxes of Carbon Dioxide and Inferring Their Spatiotemporal Covariances through Assimilation of Remote Sensing and Atmospheric Data in a Geostatistical Framework.” Additional support was provided by the National Science Foundation Integrative Graduate Education and Research Traineeship Biosphere Atmosphere Research and Training program (grant 0504552). The authors would like to thank Mary Anne Carroll for the Prophet Tower data and NASA for MODIS data. The authors also gratefully acknowledge Steve Garrity, Sharon Gourdji, Mary Anne Carroll, Dave Karowe, and Steve Bertman for feedback on this research. Finally, the authors thank Knute Nadelhoffer and the UMBS staff whose efforts contributed to this work.