This article was corrected on 6 OCT 2015. See the end of the full text for details.
 CO2 fluxes for the Netherlands and surroundings are estimated for the year 2008, from concentration measurements at four towers, using an inverse model. The results are compared to direct CO2flux measurements by aircraft, for 6 flight tracks over the Netherlands, flown multiple times in each season. We applied the Regional Atmospheric Mesoscale Modeling system (RAMS) coupled to a simple carbon flux scheme (including fossil fuel), which was run at 10 km resolution, and inverted with an Ensemble Kalman Filter. The domain had 6 eco-regions, and inversions were performed for the four seasons separately. Inversion methods with pixel-dependent and -independent parameters for each eco-region were compared. The two inversion methods, in general, yield comparable flux averages for each eco-region and season, whereas the difference from the prior flux may be large. Posterior fluxes co-sampled along the aircraft flight tracks are usually much closer to the observations than the priors, with a comparable performance for both inversion methods, and with best performance for summer and autumn. The inversions showed more negative CO2 fluxes than the priors, though the latter are obtained from a biosphere model optimized using the Fluxnet database, containing observations from more than 200 locations worldwide. The two different crop ecotypes showed very different CO2uptakes, which was unknown from the priors. The annual-average uptake is practically zero for the grassland class and for one of the cropland classes, whereas the other cropland class had a large net uptake, possibly because of the abundance of maize there.
 Knowledge of the surface atmosphere fluxes of CO2is important for our understanding of current and future climate change, and in particular the response of the carbon cycle to climate. The only existing direct observations of these fluxes consist of eddy-covariance measurements that provide information at scales of a few 100 of meters to a few kilometers [Baldocchi et al., 2001] at best, and in case of heterogeneous surfaces, need to be scaled up with land cover information and models, to obtain flux estimates at larger domains. However, recent research [Groenendijk et al., 2011a] shows that the vegetation parameters on which the CO2-fluxes depend, are much more variable than assumed by current vegetation models, and this causes large uncertainties in upscaling. Another direct flux approach which has already been applied for the Netherlands is the222Radon-tracer method [e.g.,van der Laan et al., 2009a, 2010] which can be used for much larger scale (i.e., regional) surface flux estimates. However, its results are directly proportional to the assumed 222Radon soil emission rate, which is currently not well known. Inversion methods that derive fluxes from concentration measurements, a transport model and a priori guesses of the surface flux field, are arguably our current best method to obtain a more spatially integrated perspective.
 There are, however, specific challenges with the application of inversion methods to determine fluxes at relatively high resolution. First, to apply an inversion to a limited area, it is necessary to use a high resolution transport model that resolves mesoscale circulations (size from a few km to a few hundreds of km), and the recycling of nocturnal CO2 [e.g., Sarrat et al., 2007; Ahmadov et al., 2009; Schuh et al., 2010; Rivier et al., 2010; Broquet et al., 2011]. Second, an a priori flux parameterization using surface maps at high resolutions is needed to resolve the heterogeneity of the surface fluxes. The third, while also common to more global inversions, is the large number of unknowns that have to be constrained by a limited number of observations. Finally, sufficient temporal resolution is required to obtain a good match with observed concentrations that exhibit large diurnal variability.
 Until recently, most regional scale inversions have worked with “synthetic data” to test the performance of the inversion methods and the measurement network [e.g., Zupanski et al., 2007; Carouge et al., 2010; Gourdji et al., 2010; Tolk et al., 2011]. Such work is obviously of considerable importance, but as synthetic flux fields form the basis of these methods it remains speculative to which extent the results can be generalized toward the real world. To test whether such regional methods produce credible results when applied to real observed data requires an independent comparison with observed flux data. The lack of appropriate data has unfortunately often presented a significant hurdle for such validation. For instance, the inversions by Göckede et al.  use observed concentration data from two towers, but lack an independent validation of the calculated fluxes, while Rivier et al.  evaluate their results against independent biosphere model calculations. Recently, as more appropriate flux data have become available, such data have been used for validation: Schuh et al. , Broquet et al. , and Lauvaux et al.  evaluate fluxes against tower measurements, and Lauvaux et al.  also employ additional aircraft measurements.
 In this study, we extend that analysis further from the campaign scale to the seasonal scale by applying two state-of-the-art inversion methods to obtain the CO2-fluxes for the Netherlands for the year 2008. The inversion schemes we use are based on previous theoretical and synthetic work byTolk et al. [2009, 2011]. A relatively dense and well-maintained network of four towers is used for the CO2 concentration measurements. A large amount of flux measurements by aircraft (O. S. Vellinga et al., Calibration and quality assurance of flux observations from a small research aircraft, submitted to Journal of Atmospheric and Oceanic Technology, 2012) is available for all the seasons in 2008 to validate the calculated fluxes. This setup also offers the opportunity to test the usefulness of the existing concentration measurement network for regional inversions.
 The setup of the modeling work is, to a large extent, similar to that in the previous studies: Tolk et al.  for the forward modeling and Tolk et al.  for the inversion modeling. A Bayesian inversion scheme that uses an ensemble Kalman filter with prior fluxes, is applied to estimate the surface CO2 fluxes. Based on the comparison by Tolk et al. , the two best performing inversion setups (“parameter” and “pixel” inversion) were selected. In contrast to the previous synthetic data study, the inverse modeling is performed with real CO2 concentration measurements. No “synthetic truth” is involved. Another difference with the Tolk et al. [2009, 2011]studies is that the calculations are performed with season-dependent model parameters, rather than stationary model parameters.
 The next paragraphs present a summary of the modeling system used, and document the specific changes compared to the previous studies. The observation methods are also described.
2.1. Transport Model and Background Fields
 The transport model used in this study is the Regional Atmospheric Modeling System (RAMS), specifically version B-RAMS-3.2, with some adaptations described inTolk et al. . The domain includes the Netherlands and some of its surroundings (Figure 1). For this study, a single grid with 10 km resolution is used. Reanalysis data from ECMWF (which we imported at resolution 0.5°) are used for initialization and boundary conditions for the meteorological fields, where nudging is applied only close to the boundaries. Sea surface temperatures are also obtained from the ECMWF reanalysis.
 The CO2 transport is calculated simultaneously with the atmospheric modeling (Eulerian method). For initial and boundary conditions of the CO2 mixing ratios, optimized fields at 1° × 1° resolution from CarbonTracker Europe [Peters et al., 2010] were used. Ensemble modeling is applied: One hundred three-dimensional CO2-fields are simulated simultaneously, each of them driven by its own surface flux field (see hereafter).
2.2. Surface Modeling
 The surface model LEAF-3 is part of RAMS, and is used to calculate the meteorological fluxes from the land to the atmosphere. Land use is specified according to the Corine2000 database, and Leaf Area Index (LAI) according to MODIS data (monthly values). The domain contains six different land use classes, as shown inFigure 1. The crop-covered pixels are classified according to the absence (“crops-1”) or presence (“crops-2”) of significant areas of natural vegetation. Subgrid patches of grassland and maize are more abundant in land use class crops-2 than in land use class crops-1. The latter is characterized by more large-scale farming (potatoes, cereals) and locally by horticulture. The class “other” concerns several kinds of areas (urbanized areas, dunes).
 CO2 fluxes from fossil fuel burning are taken from the IER database at 10 km resolution (CarboEurope, Emission Data Europe, 2003, http://carboeurope.ier.uni-stuttgart.de). These data are based on the year 2000. Since according to the national inventories (RIVM, Pollutant Release and Transfer Register, http://www.emissieregistratie.nl/ ERPUBLIEK/erpub/weergave/grafiek.aspx), the emissions grew from 178.2 Mton (2000) to 186.7 Mton (2008), the fossil fuel flux is multiplied with an constant scaling factor of 1.05 to obtain fluxes for 2008. Results appear rather insensitive to this scaling factor. To cope with the fact that fossil fuel emissions are lower in weekends, the emissions of 2000 were used with a shift of three days to get the days of the week matching those of 2008. The uncertainty of these fluxes is included in the “observation-representation” uncertainty (see below).
 The calculation of the CO2 surface fluxes is performed, simultaneously with the atmospheric transport calculations, for a random ensemble of parameter combinations, each ensemble member generating its own CO2 field. CO2 assimilation and autotrophic respiration are calculated with a scheme derived from Farquhar et al. , and heterotrophic respiration according to Lloyd and Taylor . More details can be found in Tolk et al. .
2.3. Modeling Periods
 Four separate model simulations have been performed: (1) Spring: March–May 2008; (2) Summer: June–August 2008; (3) Autumn: September–November 2008; and (4) Winter: January, February and December 2008.
 To obtain a comparable winter season, winter data have been combined for the winter 2007–2008 (January–February 2008) and that of 2008–2009 (December 2008). These periods are run separately for their meteorology but with a single set of vegetation parameters. The results are combined afterwards, so that effectively one season is obtained.
2.4. The Weather in 2008
 In the modeling domain, the first four months in 2008 were climatologically mild or very mild, except for March that was relatively cold. May 2008 was the hottest May in 100 years. The summer was rather wet, but warm. The autumn was average. December was cold compared to 2000–2010 average (KNMI, Month and season surveys for 2008, knmi.nl/klimatologie).
2.5. Parameter Inversion
 For each of the six land use classes, two parameters are estimated: carboxylation capacity (Vcmax) to control photosynthesis (and indirectly autotrophic respiration), and reference respiration rate (R10) which controls heterotrophic respiration. Hence, for this method, 12 unknowns have to be solved per season. In contrast to Tolk et al. , the values of quantum yield (α) and activation energy (E0) were kept fixed everywhere, to prevent the aliasing effects as discussed in Tolk et al. . For E0/R a value of 200 K is used (R denotes the gas constant). The parameters Vcmax and R10 are assumed to be stationary within each season. The prior parameter values used in the inversions are identical for each season, and given in Tolk et al. . Due to the imperfectness of observed LAI-values and of the vegetation model,Vcmax and R10have the character of tuning parameters, whose best fits may be season-dependent [Groenendijk et al., 2011b]. For this reason we allow their posterior values to depend on season.
 With the reduction in number of parameters to solve for each land-use class, the inversion method resembles the so-calledβRG0.0 – method of Tolk et al. , since the unknown parameters are essentially linear or close to linear scaling factors. In setting up the ensemble (100 members), the parameters are assumed to be uncorrelated, and to have standard deviations of 30 μmol m−2 s−1 (Vcmax) and 2 μmol m−2 s−1 (R10).
 As in Tolk et al. , to suppress the influence of random noise in the updating of the parameters we prescribe that a parameter is updated on inversion, only if after processing of all the observations, σprior/σpost for that parameter is at least 1.05 times the smallest σprior/σpost of all the parameters [Zupanski et al., 2007].
2.6. Pixel Inversion
 The inversion procedure is extensively described in Tolk et al. , and is summarized here briefly. The domain contains 1109 land pixels of 10 km × 10 km. For each pixel, the surface CO2-flux is
with the scaling factors β depending on pixel but not on the time within a specific season. The prior fluxes are calculated from the prior parameter values in the forward run, and for the two β's an ensemble is set up with the following properties. The means are equal to one, and there is no correlation between βGPP and βresp, nor between the β's of different land use types. Within a land use class, the βGPP-values are correlated with an e-folding length of 100 km, as was found appropriate byTolk et al. . The standard deviation of βGPPis constant within a land use class, and is tuned so that the variance of the time series of each land-use-class-averaged flux is the same as for the ensemble that was used for the parameter run. To reach this, first an initial run has to be executed; from that run we calculate how theβ's have to be rescaled to meet the variance requirement. For βresp, the same remarks apply as for βGPP. The number of unknowns to be solved amounts to 2218 for each season. The rule for suppressing the influence of random noise is applied in the same way as for the parameter inversion (see above).
2.7. Overview of the Inversions
 All runs are performed for each season separately. First, runs were executed with an ensemble of parameters (for the parameter inversion) or β-coefficients (for the pixel inversion). Then the inversions were performed, and new runs were performed with an ensemble of posterior parameters orβ-coefficients, respectively. The CO2mixing ratio fields generated by the (ensemble of) fluxes is propagated through the domain from day-to-day, and constrained on the larger scales by the CarbonTracker boundary conditions. Each new seasonal inversion starts with a new initial CO2 field from CarbonTracker.
2.8. Concentrations From Atmospheric Observations
 Hourly atmospheric CO2 concentrations from four observation sites for the year 2008 are used. The measurement locations are also shown in Figure 1. The Cabauw mixing ratio observations are described in Vermeulen et al. . At Loobos, concentrations were measured using a single infrared gas analyzer and a solenoid switching system. An AIRCOA system was used (http://www.eol.ucar.edu/∼stephens/RACCOON). The uncertainty (standard error) of the CO2 concentration measurements with the AIRCOA system is 0.2 ppm. See, for further information, Elbers et al. . At Lutjewad, concentrations are measured with a modified Agilent 6890 N Gas Chromatograph. The obtained measurement uncertainty is usually <0.1 ppm. For details, see van der Laan et al. [2009a]. At Hengelman, concentrations were measured at one level using a single infrared gas analyzer CIRAS-SC (PP Systems, Amesbury, USA), which was calibrated twice daily. The uncertainty of the CO2 concentration measurements with the CIRAS systems was 2 ppm. The measurement heights above ground level used for this paper are 200, 24, 60 and 18 m for Cabauw, Loobos, Lutjewad and Hengelman, respectively, and all measurements are reported on the WMO2007x scale. Only hourly values (average over last 5 min), from 11 to 16 UT (6 values) are used for each day, since transport errors are likely to be larger for other hours [Tolk et al., 2011].
 An “observation representation uncertainty” (standard error) has to be assigned to the concentrations, but its quantification is difficult. Tolk et al.  found that for synthetic inversions with the present model and network, a hourly uncertainty of 1.2 ppm worked well. This translates to an uncertainty of 1.2/√6 = 0.5 ppm for the daily average over 6 values. Since the present work with real observations has also to cope with (large but unknown) transport errors, we have enhanced the estimated uncertainty to 2 ppm. This explains why our uncertainty is somewhat larger than the instrument uncertainties. This value is multiplied with √6 to obtain the hourly observation representation uncertainty. For the autumn (SON), the data from Hengelman have been omitted because of known calibration issues. For the winter, there were no data from Hengelman available.
2.9. Surface Fluxes From Atmospheric Observations
 Flux observations were carried out by a small, low altitude and at low speed flying Sky Arrow 650 TCNS aircraft (Vellinga et al., submitted manuscript, 2012). There are data from flights available for 6 trajectories (Figure 1), which were flown 2 by 2 on a weekly schedule throughout 2008 and early 2009. The measurement height was usually around 70 m above the surface. The surface fluxes have been derived using the eddy covariance method based on 50 Hz raw data of wind fields, temperature, and CO2 and H2O concentrations, all measured with fast response sensors [Vellinga et al., 2010]. Covariance and fluxes were computed for 2 km windows, representing the spatial resolution of this type of airborne flux measurement. The instruments and aircraft configuration were calibrated following procedures described elsewhere (Vellinga et al., submitted manuscript, 2012). That publication also documents further details of data processing and quality assessment.
 Data were available from 64 flights. The uncertainty (standard error) in the flux measurements was estimated based on twin flights, and varies from 10 to 20% for the flight averages (uncertainties in averages over shorter distances are much larger). These fluxes are used for validating our posterior fluxes. Flux divergence occurs between the surface and the measurement level, but generally the resulting flux-loss at these flight levels is smaller than other errors [Vellinga et al., 2010, supplementary material] and neglected in the current comparison. Rather than aggregating the flux observations to prescribed parts of the model domain, as is often done [e.g., Lauvaux et al., 2009], we chose an alternative approach: A routine was added to the model to import the locations and times of the observations, and to export the calculated fluxes exactly for these locations and times.
3.1. Goodness-of-Fits for the Concentrations
Figure 2 shows a comparison of observed and modeled CO2 concentration series for Cabauw in summer. Averages of the “daytime” (11, 12, …, 16 UT) values which are used for the inversion and the distribution of the residuals are shown. The unrealistically high concentrations of the prior simulation, and the reduction of the error on inversion (both kinds) are typical for most stations and seasons. The residual distributions are close to Gaussian, as expected. Similar results are found for Loobos (not shown). For Hengelman (not shown) and in particular for Lutjewad (Figure 3), the Gaussian shape is less well approximated, which is caused by the frequent occurrence of unexpectedly high observed concentrations. It is likely that the discrepancy for Lutjewad is caused by transport errors which are not yet fully understood, but probably related to the coastal character of the station. It is unlikely that the observations are erroneous, as these have been well scrutinized [van der Laan et al., 2009a].
 To find out whether this behavior could cause a bias in the resulting fluxes, a test inversion has been performed for summer in which the high-concentration outliers were discarded. Though this obviously improved the fit for the concentrations, it did not lead to a substantial change in the fluxes, which appear less sensitive to the concentrations at Lutjewad than to other stations. This will be further considered below. For this reason, only results obtained using data that included the outliers are presented in this paper.
Table 1 lists the differences between the modeled and observed CO2 concentrations for the various stations and seasons, based on the daily averages of the “daytime” (11, 12, …, 16 UT) values which are used for the inversion. The prior concentrations show a significant bias (too high), especially in summer and autumn, for some but not all of the stations. In the posterior results, this bias has been strongly reduced. It will be shown below that the bias in the prior concentrations is most likely due to a too small modeled net uptake of CO2, rather than to an assumed high background concentration. Both Cabauw and Loobos have a strong RMS error reduction (except in winter) while Lutjewad and Hengelman have less. Our results suggest that with the present observation network, for spring, summer and autumn, the inversion scheme is able to produce concentration series which are, in general, significantly improved. They also suggest, however, a lower sensitivity specifically for the coastal station Lutjewad. A further observation is that the fit of the CO2 mixing ratios is practically always better for the pixel inversion than for the parameter inversion. This is to be expected, as the pixel inversion has much more degrees of freedom.
Table 1. Difference Between Observed, Prior and Posterior CO2 Concentrations (Daytime Averaged) for All Stations and Seasonsa
Values are in ppm. Stations: CBW = Cabauw, LOO = Loobos, LUT = Lutjewad, HEN = Hengelman.
 Nevertheless, the posterior concentrations still differ considerably from the observations. The main contributions to this difference stem from (1) transport errors, and (2) errors in the flux model. The synthetic runs of Tolk et al.  for the same network had much smaller RMS of the concentration difference. Since these runs used the same transport model, but strongly different flux models, for the forward run (creating synthetic concentrations) and the inversion, they show that the inversion can correct the errors caused by a wrong flux model, provided the transport model is accurate. Hence, it is likely that the decreased performance with real data is not due in the first place to errors in the flux model, but to the difference between the real and modeled transport. It is well known [e.g., Gurney et al., 2002; Stephens et al., 2007] that current schemes for transport modeling have imperfect treatment of vertical transport in the atmospheric boundary layer.
3.2. Flux Estimates and Uncertainty
 We now turn to the comparison of the best estimates of the fluxes for both inversion methods. Figure 4gives an overview of the flux-averages (terrestrial biogenic part) for each season and eco-region. Flux-averages for the whole year are also shown.Figure 5 shows the prior and the two posterior fields for all seasons. In interpreting the results, it should be kept in mind that the error bars depict random standard errors, as represented by the ensemble, but that they do not account for other types of errors. One such an error source is the following: with the parameter inversion, vegetation parameters etc. are changed so as to produce concentrations that better fit with the observations; but by the rigidness of the base functions, this also affects unmonitored areas which may have in reality other values for the vegetation parameters, causing a systematic (but unknown) bias there. On the other hand, for the pixel inversion, regions outside the footprint are hardly affected by the inversion, and there the posterior fluxes will tend to stay close the prior values. In both cases, errors arise locally which are not encompassed by the random spread of the ensemble. These errors are of a systematic nature, but they are very hard to quantify, because of lacking information about such things as the spatial variation of the vegetation parameters etc. Problems in transport modeling are also a source of systematic errors. Hence, real uncertainties may be larger than indicated, and results of the two methods should not always be expected to correspond within the error bars.
 The averages (Figure 4) for the dominant land use classes (grass, crops-1 and crops-2) contain the most important information. For both inversion methods, there is on average a tendency toward larger posterior net uptake in the posterior fluxes, with the exception of the winter for all eco-regions, and the summer for crops-1. The two inversion approaches, although strongly different, yield the same direction for the shifts, though the magnitudes differ sometimes more than indicated by the error bars (for the reasons explained above). A second conclusion is that crops-2 has a much larger uptake than crops-1, at least for spring and summer (the two methods disagree for autumn).
 The small summer uptake of crops-1 contrasts not only with the crops-2 but also with the grasslands uptake, which appears large in summer, as expected in the growing season. Whether this small uptake of crops-1 is real or not needs further investigation.
 An odd result is the large error bar for the crops-1 class in winter for the parameter inversion. Common Bayesian inversion cannot increase errors. However, for the parameter inversion, we used a model in which the fluxes are functions of the vegetation parameters withnonlinear dependence for some of them, and this can cause posterior errors to become even larger than the prior errors. This phenomenon has been elaborately discussed in Tolk et al. [2011, section 3.1 and Appendix C].
 Averaged over the whole year (see Figure 4), the mean flux is not significantly different from zero for three classes (Grassland, crops-1, other), but does show large net uptake for crops-2. For this class, the average uptake is 6.5 ± 0.9 and 3.5 ± 1.0 μmol m−2 s−1(standard errors), according to the parameter- and pixel-inversion, respectively. Both methods lead to a small though significant net uptake for the needleleaf forest and the deciduous broadleaf forest. The calculated uncertainties in the annual averages are small, but, as discussed above, they do not include the possible effect of systematic errors (of various origins) which could lead to relatively large shifts of these small averages.
 The sub-ecoregion distribution within one land use class often differs strongly between the inversion methods. As expected, for the parameter inversion the spatial distribution is rather homogeneous, while more spatial structure is present in the pixel inversion results.Figure 5 illustrates how the distribution of observation towers, together with the chosen structure of the unknowns and assumed covariances, spreads information across the domain to yield such differing regional fluxes. Whereas the pixel inversion focuses most of its parameter adjustments in a region around the towers, the ecoregion based method spreads information over a larger domain, and much more homogeneously. This result is consistent with earlier inverse studies employing such “regularization” methods [Carouge et al., 2010; Schuh et al., 2010].
 Concerning the smaller classes, there is often (summer, autumn) a difference in the results of the two inversion methods for the needleleaf forest, in spite of the fact that the class is monitored at Loobos. For this class, the uncertainty in the posterior fluxes was found to be usually greater than for the classes with a larger surface area (Figure 4). Little information seems to be retrieved by the network for the deciduous broadleaf forest class (no direct observations in the area), and the “other” class (very small fluxes).
Figure 6shows the relative improvement of the standard error, as calculated by the Kalman filter. Note that the results for autumn and winter were obtained with a reduced network (no Hengelman data). Since the parameters are spatially constant for each region, the error reduction map reflects the land use map. For the same reason, the error reduction is for most eco-regions much stronger than for the pixel inversion (for which there are much more unknowns to constrain). This strong reduction of the error per pixel is an artifact of the parameter method. The error reduction is primarily calculated for the vegetation parameters, and causes an appropriate error reduction for the average fluxes over the ecoregions to which these parameters apply. However, owing to the low number of basic functions, the small spread of the averages is automatically translated to a small spread per pixel, causing an unrealistically low uncertainty in the flux per pixel. The other (pixel) inversion method, on the other hand, does not suffer from this artifact.
 The finer structure of the error reduction close to the observation sites shows details which are not always obvious to explain. Cabauw and Loobos have an overlapping region of influence, which is mainly restricted to grassland, which limits the effective radius. For Hengelman the region of influence is larger, because of the extensive crops-1 region there. It is remarkable that the influence of Hengelman is most conspicuous on the eastern side, whereas the prevailing wind direction is from the west.
 From Figure 6, Lutjewad is seen to have the smallest influence on the error reduction. The impact of the coastal station Lutjewad on error reduction depends on the frequency of southerly wind, which is locally on average about 30% of the time [van der Laan et al., 2009b]. The southerly winds are less prevalent in spring than in summer and autumn 2008 (see Table 2).
Table 2. Wind Direction Frequencies (Days per Season per 90 Degree Sector) in 2008 According to the Daily Vector-Averages of Station De Bilt, in the Center of the Netherlandsa
Data obtained from Royal Netherlands Meteorological Institute (KNMI).
3.3. Comparison With CO2 Flux Measurements By Aircraft
 The aircraft flux measurements are summarized in Table 3. The winter measurements were restricted to December 2008, as the flights started in March 2008, and the inversion results are confined to 2008. The error in the observed fluxes is estimated as 15% based on comparison of simultaneous flights over SW France in 2007 (O. S. Vellinga, unpublished data). Figure 7 shows an example of one day of flux measurements by aircraft, compared to modeled posterior total fluxes, found by both parameter and pixel inversion. Note that the simulated fluxes pertain to the same places and times as the observations, so that unnecessary aggregation uncertainties are avoided.
Table 3. Number of Days With Observations for Each Flight Trajectory, per Season
Figure 7illustrates the problems pertaining to the comparison of calculated and observed fluxes on the short-term. First, continuous observations exist only for brief intervals. Second, the simulated and observed time series have different shapes, because the observations are strongly influenced, on the short-term, by random effects like turbulence and intermittent clouds, which are in the simulations either averaged out, or not well timed. As a consequence of this randomness, it is practically impossible to assess the flux difference between ecoregions by looking at data from single days.
 Since it appears rather meaningless to compare observed fluxes, averaged over 2 km, with our posterior fluxes, we compare in the following only averaged flux values which belong to the same trajectory and season. Figure 8 shows these average flux values for the observations, priors and the two posteriors. As indicated earlier (at the start of the discussion of Figure 4), the standard errors which are given for the posterior fluxes may underestimate the uncertainty, as they do not account for systematic errors which are inherent to the inversion methods. Within the enhanced uncertainty of both our estimates and the aircraft data, the observations confirm, in most cases, the shift toward much larger uptake (for spring to autumn) that is produced by the inversions. This increases the confidence in the ability of the inversion system to improve on prior estimates, and also demonstrates the value of our assimilation approach in integrating different types of information of the regional carbon cycle.
Figure 9ashows the root-mean square differences between the simulated (prior and both posteriors) and the observed average fluxes, for all seasons. The employed averages are taken immediately fromFigure 8. In the comparison to independent flux data we find a remarkable improvement of estimated fluxes over prior fluxes for summer and autumn, but not for winter and spring. The bad result for winter is related to the existence of small fluxes overall with the coupling between observed concentrations and nearby fluxes being weak, so that the posterior values do not move far away from the priors. A likely cause for the spring mismatch is the representation of the LAI, which changes faster in spring than in other seasons. The monthly LAI maps (used to calculate both prior and posterior fluxes) cannot well resolve these changes. The LAI maps were according to MODIS data for 2006, but have not been adjusted to 2008. However, an inspection of the meteorological data (source: KNMI) shows no reason for a great difference. Tolk et al. also suggests that regional scale inversions appear to be quite sensitive to the precise specification of the land surface properties. The difference in performance between the parameter- and pixel inversion is small.
Figure 9bshows root-mean square differences between the average fluxes ofFigure 8, this time for each trajectory. In the computation, the winter data were not used. There are large differences in performance between the trajectories: when considering the parameter inversion, a quite large error reduction is noted for the West and South and, to a lesser extent, Center and East trajectory. For the others, the error reduction is modest or even (for north) absent. There is no clear link to the presence or absence of concentration measurements close to the trajectory: The North trajectory has the worst performance, although it is covered by the Lutjewad site. This might again be because Lutjewad is a coastal station, and concentrations are insensitive to land based CO2 fluxes when the wind is onshore (which occurs for March–November 2008 for about 40% of the time, and maybe more often due to local sea breezes [e.g., Ahmadov et al., 2009]. Strong horizontal flux gradients may also be a source of errors, as the aircraft roughly follows the coastline for the North trajectory. The strongest error reduction and the best posterior fluxes are obtained for the South trajectory, though there are no concentration measurements performed there. This trajectory largely runs through the crops-2 eco-region, which was seen earlier (in the section on fluxes: best estimates) to have strong and consistent flux shifts produced by the inversion scheme. The strongest observed uptake (seeFigure 8) occur in summer for trajectories South (largest) and East (second largest), which happen to be the trajectories for which the crops-2 class is dominant respectively substantial (Figure 1). These flux measurements confirm the large uptake for crops-2 which was found by the inversion (Figure 4).
3.4. Calculated National Carbon Budget of the Netherlands for 2008
Table 4 shows the calculated biotic uptake integrated over the Netherlands, for the seasons and for the whole year (the land area, calculated on model resolution, is about 35000 km2). For comparison, the integrated fossil fuel emission (as assumed for the modeling) has been added. As elsewhere in this paper, the winter contribution is the sum of the months January, February and December 2008. For winter, an unusually large relative uncertainty is calculated for the parameter inversion. This is related to the nonlinearity of the parameter inversion, which seems to cause specific problems when winter data are used, as remarked earlier when discussing Figure 4.
Table 4. Calculated Carbon Budget of the Netherlands, According to the Two Methodsa
Biotic, Parameter Inversion
Biotic, Pixel Inversion
Fossil Fuel Emission
Unit: TgC season−1. The fossil fuel emission for the same region has been added for comparison.
−7.0 ± 0.8
−6.8 ± 0.9
−9.9 ± 1.0
−7.8 ± 1.0
−5.9 ± 0.5
−4.4 ± 0.8
4.4 ± 2.0
3.5 ± 0.7
−18.5 ± 2.4
−15.4 ± 1.7
 The estimated uncertainty for the year sum is larger for the parameter than for the pixel inversion, which is caused by the uncertainty in the winter contribution.
4. Discussion and Conclusions
 The results of the paper have to be interpreted carefully, because the flux values resulting from the inversions may have biases (dependent on the inversion method and the region) which are difficult to characterize and estimate, and which cause results from different inversion methods to be differ more than expected from the random errors. Important factors contributing to this are, besides transport errors, also erroneous assumptions concerning spatially constancy or smooth spatial correlations of vegetation parameters etc., and there will be more research needed to mitigate such problems.
 An important observation is that the prior fluxes for the net uptake are in general too small. This follows both from the comparison with concentration measurements (using inversion) and from the flux measurements (performed by aircraft). The reason however is not entirely clear. There are uncertainties in both the biotic component and the heterotrophic respiration. The first is based on a rather well-founded vegetation model combined with LAI-maps based on observations. On the other hand, for the heterotrophic component there is a lack of data, and we had to base the estimates on preliminary research [Tolk et al., 2009]. The present results suggest that the prior heterotrophic respiration is too large for the dominating land use types.
 The inversion produces posterior fluxes which are, on average more reliable than the priors. The comparison with independent flux estimates from aircraft confirms this. This pertains primarily to the flux averages as observed by aircraft flights. On a finer scale, the scatter between observations and simulations remains quite large, owing to the noisy nature of the real turbulent fluxes, as illustrated by Figure 7. Further, there is no improvement for winter. The small fluxes in winter and the lack of convection (causing larger transport errors) are likely to be the main reasons why improvement by inversion is difficult for the winter season. The larger impact of errors in the assumed fossil fuel emissions in winter may also play a role.
 The present results also bear on the relation of the results to spatial and temporal resolution. We had to average the aircraft measurements over trajectories to obtain useful results. The bars in Figure 8 actually represent averages over observations of, on average, 2.7 days (of the about 91 days in a season). In spite of this rather sparse temporal coverage, the inversion produces a considerable improvement of the RMS difference between simulation and observation (for most of the trajectories and seasons). This shows that the inversion with the present setup produces already a considerable improvement of averages even over periods of no more than a few days. Note that these results primarily refer to daytime values.
 The improvement for spring is less than for summer and autumn. We suggest this is caused by errors in the modeling of the timing of LAI changes. This parameter changes faster in spring than in the other seasons.
 A simple experiment was performed to estimate the sensitivity of posterior fluxes to CO2 boundary conditions, for the summer season only: the inversion was repeated with one ppm subtracted from the background field (the response to bigger shifts can be estimated using linearity). For the parameter inversion, this caused a shift of the posterior fluxes of +0.8 to +1.0 μmol m−2 s−1for grassland and crops-2, but 0.0 for crops-1. For the pixel inversion, the shifts were quite evenly distributed for the dominant classes: +0.7 to +0.8 μmol m−2 s−1for grassland, crops-1 and crops-2. These shifts preserve the flux pattern (for the assumed 1 ppm), but cause the overall flux average to become less negative. Nonetheless, a substantial bias is not expected. The use of the results of European scale CO2 inversions, and the various meteorological circumstances and wind directions over which the results for a season are averaged are expected to prevent a large bias.
 It is difficult to draw conclusions concerning the performance of the inversion in recovering flux field structures smaller than the eco-region scale. There are sometimes strong differences between the outcomes of the two inversion methods, but it remains in general difficult to say which one performs best. Whereas the parameter inversion assumes an unproven homogeneity of vegetation and heterotrophic respiration parameters, the pixel inversion is more flexible, but its results reflect to some extent the stochastic properties (mean field as well as noise) of the prior ensemble. Eddy correlation (EC) measurements, from surface sites and by aircraft, lack the required spatial and temporal averaging to settle the question.
 The inversions showed a large and unexpected difference in the behavior of the two crops regions. The large uptake of the second crops class cannot be explained from the higher sub-pixel abundance of natural vegetation, as such vegetations tend to have a small uptake (also in our results). The difference must thus be caused by a difference in crops species. We suggest that the higher abundance of maize in the second crops class contributes much to its large uptake. Maize is known to have a very large uptake [Verma et al., 2005]. However, the data set of Fluxnet measurements within the modeling domain, which was used to tune the model [Tolk et al., 2009], contained no sites with maize [Groenendijk et al., 2011a] and the present results suggest that this has caused a bias in the prior flux calculations. The annual carbon balance according to the inversions is practically zero for both grassland and the first crop class, whereas for the second class there is a significant uptake.
 A negative feature of the results, which was found to a weaker extent in the synthetic inversions of Tolk et al. , is the “aliasing” in the two terms in the net flux, biotic flux and heterotrophic respiration. The aliasing is evident from the occurrence of cases with negative (hence certainly spurious) posterior heterotrophic respiration. This is worrying because it causes difficulties in accurately identifying errors in the flux modeling, such as those, which cause the bias in the prior fluxes. An improvement would require in the first place an improved transport modeling for the inversions, in particular better modeling of nocturnal transport. This is a rather long-lasting problem, though some advances have been made [Steeneveld et al., 2008].
 This study presents the first regional scale inversion of CO2fluxes for the Netherlands using an inverse model. The posterior fluxes were compared with aircraft measurements of seasonal and flight-leg averaged fluxes. For most regions, there is a significant and sometimes strong improvement of the posterior fluxes. The improvement is greatest for summer and autumn, whereas for winter, no improvement occurs. For spring, it will be important to have reliable data for the development of the LAI in time. For extended eco-regions, there was significant improvement of the average fluxes, also if no homogeneity of the unknown parameters within the eco-region was assumed. On the other hand, it is difficult to monitor small eco-regions, even if they have a nearby site for concentration measurements, and to monitor urbanized regions, which have small fluxes. Though improvements with respect to the prior fluxes are clear, the posterior results still depend on assumptions that remain difficult to validate, such as homogeneity of parameters for vegetation and heterotrophic respiration within an eco-region. The results reveal a large and unexpected difference between the fluxes for crops eco-regions without and with significant natural vegetation, especially in summer (much smaller net uptake for the first class). This is most likely caused by a very large uptake of one or more crop types that are more abundant in the second class (potentially maize).
 This work has been executed in the framework of the Dutch project “Climate changes Spatial Planning,” BSIK-ME2 and the Carboeurope Regional Component (GOCE CT2003 505572). GHG Europe ENV.2009.1.1.3.1 grant 244122. W. Peters was supported by NWO VIDI grant 864.08.012. The Strategic Knowledge Development Program of Wageningen UR on Climate Change, funded by the Dutch Ministry of Economy, Agriculture and Innovation, also supported the aircraft measurements. We thank all pilots of Aerocompany Vliegschool Teuge BV (ACVT, the Netherlands). We thank Wageningen University and Research Center - Green World Research (ALTERRA) for providing us with the Loobos and Hengelman data. We gratefully acknowledge the CIO, Lutjewad measurement station's staff: Bert Kers and Jan Schut. ECMWF, the Royal Netherlands Meteorological Institute (KNMI) and IER Stuttgart are acknowledged for providing background data.
In the originally published version of this article, Table 4 contained wrongly calculated values due to an unnoticed software incompatibility, leading to incorrect summing up of fluxes over the country. The table has since been corrected, along with a small amount of text referring to the incorrect results and this version may be considered the authoritative version of record.