## 1. Introduction

[2] Understanding of the influence of biogeochemical processes, disturbance, and climate on terrestrial carbon dioxide (CO_{2}) fluxes at relatively small spatial scales (e.g., plot scales) has improved significantly in recent years. Uncertainties persist, however, in large part because the processes that control sources and sinks of atmospheric CO_{2}, particularly at the ecosystem scale, are not well understood because of complex interactions with their environmental drivers.

[3] Although several approaches are commonly used to approximate ecosystem CO_{2} flux, the eddy covariance method is unique because it can provide a more direct measurement of the flux between vegetation and the atmosphere. In this approach, the flux of CO_{2}, i.e., the net ecosystem exchange (NEE), is approximated as the covariance of the deviations in atmospheric CO_{2} concentrations and vertical wind speed from their means, along with corrections for fluctuations in water vapor and temperature [e.g., *Baldocchi et al.*, 2001]. At present, the AmeriFlux network consists of approximately 100 active sites located in various ecosystems [*Hargrove et al.*, 2003]. Although there can be large uncertainties associated with these half-hourly measurements [*Richardson et al.*, 2006], NEE estimates have been used to improve understanding of the temporal variability of CO_{2} surface flux of particular ecosystems through statistically inferred relationships at daily or longer temporal scales [*Law et al.*, 2002].

[4] This paper presents and applies a new adaptation of geostatistical regression that can be used with eddy-covariance data to investigate the relationships between NEE for a forest ecosystem and environmental variables at multiple time scales.

### 1.1. Statistical Inference Using NEE Measurements

[5] One of the benefits of using high-frequency eddy-covariance data to investigate the relationship between fluxes and environmental factors is that both long- and short-term trends can be inferred from the data [e.g., *Stoy et al.*, 2009]. Statistical approaches such as neural networks [e.g., *Stoy et al.*, 2009] and linear regression [e.g., *Law et al.*, 2002; *Hui et al.*, 2003] have been used to understand the climatic controls of both the interannual and seasonal variability of carbon cycling at flux tower sites. Regression methods have the advantage of providing statistical relationships between given variables and flux. However, traditional regression approaches are limited by (1) the approach used to select the variables to include in the regression, (2) the assumption of independent and identically distributed residuals, and (3) assumptions regarding the dependent variable (i.e., how to best decompose NEE into photosynthetic uptake and respiration).

[6] The first of these limitations centers on the methods used to select the variables to include in the regression model, referred to henceforth as the model of the trend. Frequently, only a subset of available variables is included in the analysis (e.g., photosynthetically active radiation (PAR), soil temperature, air temperature, leaf area index (LAI), etc.) [e.g., *Urbanski et al.*, 2007] while other potentially important data are not used (e.g., friction velocity, Normalized Difference Vegetative Index (NDVI), etc.). From this subset, every variable is typically regressed individually against flux measurements to infer relationships [*Law et al.*, 2002; *Hui et al.*, 2003]. Such an approach could lead to environmental variables obscuring each other's effects [*Faraway*, 2005]. For example, Gross Ecosystem Exchange (GEE) is a function of both air temperature and light [*Blackman*, 1905]. If each variable is regressed separately, the effect of air temperature could mask the effect of light, making this second variable appear not to be significant [*Faraway*, 2005]. This problem can be avoided if joint contributions between auxiliary variables are allowed. Although some studies have included more than one variable in regression analyses [e.g., *Hibbard et al.*, 2005], sequential methods based on *F* tests for selecting the variables used in the regression do not account for the joint contributions of all possible combinations of variables.

[7] Second, it is likely that the CO_{2} flux regression residuals will be temporally correlated, especially at submonthly scales. Ignoring this correlation can lead to a misrepresentation of the relationship between an environmental variable and flux [e.g., *Hoeting et al.*, 2006]. As such, temporal correlation must be assessed and included in both the model selection scheme and the statistical regression. Although noted as a limitation [*Law et al.*, 2002], previous studies have not accounted for correlation in regression residuals.

[8] The final limitation is related to the eddy-covariance measurements themselves. Conceptually, NEE is the small difference between two large fluxes, namely photosynthetic carbon uptake via GEE and release of CO_{2} into the atmosphere through a combination of heterotrophic and autotrophic respiration (R_{h+a}). Each of these fluxes is affected differently by environmental controls. In addition, variables such as light, nutrient availability, and water stress have complex interactions with each other and with each flux component, making it difficult to ascertain the influence of a particular variable on either GEE or R_{h+a}. In past studies, statistical regression methods (such as simple and multiple linear regression) have been used to infer relationships between flux components (GEE or R_{h+a}) and either a single environmental variable or some predetermined combination of variables [e.g., *Law et al.*, 2002; *Urbanski et al.*, 2007]. This requires the measured NEE signal to be separated into GEE and R_{h+a} prior to the analysis. This is generally achieved using one of three methods: by (1) subtracting the nighttime NEE from the daytime NEE signal [*Urbanski et al.*, 2007], (2) deriving R_{h+a} from a regression using nighttime fluxes at high friction velocity and an exponential transformation of soil temperatures [e.g., *Law et al.*, 1999; *Hibbard et al.*, 2005], or (3) modeling GEE using PAR. Some studies have shown that these methods for dividing NEE into separate parts lead to large uncertainties in the inferred R_{h+a} [e.g., *Janssens et al.*, 2001], possibly biasing inferred relationships.

### 1.2. Study Goals

[9] This paper presents a geostatistical regression (GR) algorithm designed to elucidate processes controlling carbon exchange at various temporal scales at eddy covariance towers by addressing the first two limitations described above. In regard to the third limitation, the ability of the GR method to separate the auxiliary variables associated individually with carbon uptake and release is also investigated. The presented approach is applied at the AmeriFlux tower site at the University of Michigan Biological Station (UMBS) to improve understanding of carbon cycling for this mixed hardwood forest at daily, 8 day, and monthly time scales.

[10] The presented approach improves on the regression methods used in previous studies in two distinct ways. First, the final model of the trend only includes variables that are selected using a new adaptation of the commonly employed Bayes Information Criteria (BIC) [*Schwarz*, 1978], which accounts for the joint probabilities of all possible combinations of variables, and compares nonnested models (where models are not necessarily subsets of each other) unlike hypothesis tests (i.e., *F* test). Second, the GR method accounts for temporal correlation in the portion of the flux signal that is not explained by the environmental data sets to more accurately characterize the relationship between a set of environmental variables and flux.

[11] The objectives of the application to the UMBS site are to (1) identify the dominant explanatory variables for NEE, GEE, and R_{h+a} at different temporal scales in order to improve the understanding of carbon cycling at UMBS, (2) determine the impact of temporal scale on the inferred relationships between the environmental variables and CO_{2} flux, and (3) evaluate whether environmental variables can be used to isolate the GEE and R_{h+a} signals from the NEE measurements using a GR framework.