This paper addresses the problem of calibrating an ensemble for uncertainty estimation. The calibration method involves (1) a large, automatically generated ensemble, (2) an ensemble score such as the variance of a rank histogram, and (3) the selection based on a combinatorial algorithm of a sub-ensemble that minimizes the ensemble score. The ensemble scores are the Brier score (for probabilistic forecasts), or derived from the rank histogram or the reliability diagram. These scores allow us to measure the quality of an uncertainty estimation, and the reliability and the resolution of an ensemble. The ensemble is generated on the Polyphemus modeling platform so that the uncertainties in the models' formulation and their input data can be taken into account. A 101-member ensemble of ground-ozone simulations is generated with full chemistry-transport models run across Europe during the year 2001. This ensemble is evaluated with the aforementioned scores. Several ensemble calibrations are carried out with the different ensemble scores. The calibration makes it possible to build 20- to 30-member ensembles which greatly improves the ensemble scores. The calibrations essentially improve the reliability, while the resolution remains unchanged. The spatial validity of the uncertainty maps is ensured by cross validation. The impact of the number of observations and observation errors is also addressed. Finally, the calibrated ensembles are able to produce accurate probabilistic forecasts and to forecast the uncertainties, even though these uncertainties are found to be strongly time-dependent.