This paper presents a data-driven approach for estimating the degree of variability and predictability associated with large-scale wind energy production for a planned integration in a given geographical area, with an application to The Netherlands. A new method is presented for generating realistic time series of aggregated wind power realizations and forecasts. To this end, simultaneous wind speed time series—both actual and predicted—at planned wind farm locations are needed, but not always available. A 1-year data set of 10-min averaged wind speeds measured at several weather stations is used. The measurements are first transformed from sensor height to hub height, then spatially interpolated using multivariate normal theory, and finally averaged over the market resolution time interval. Day-ahead wind speed forecast time series are created from the atmospheric model HiRLAM (High Resolution Limited Area Model). Actual and forecasted wind speeds are passed through multi-turbine power curves and summed up to create time series of actual and forecasted wind power. Two insights are derived from the developed data set: the degree of long-term variability and the degree of predictability when Dutch wind energy production is aggregated at the national or at the market participant level. For a 7.8 GW installed wind power scenario, at the system level, the imbalance energy requirements due to wind variations across 15-min intervals are ±14% of the total installed capacity, while the imbalance due to forecast errors vary between 53% for down- and 56% for up-regulation. When aggregating at the market participant level, the balancing energy requirements are 2–3% higher. Copyright © 2008 John Wiley & Sons, Ltd.