This paper is based on a comparative study on ultraviolet radiation (UV) measurements and UV reconstruction models for eight sites in Europe. Reconstruction models include neural network techniques and radiative transfer modeling combined with empirical relationships. The models have been validated against quality-controlled ground-based measurements, 8 to 20 years, on time scales ranging from daily to yearly UV sums. The standard deviations in the ratios of modeled to measured daily sums vary between 10 and 15%. The yearly sums agree within a 5% range. Depending on the availability of ancillary measurements, reconstructions have been carried out to the early 1960s. A method has been set up to educe one best estimate of the historical UV levels that takes into account the long-term stability and underlying agreement of the models, and the agreement with actual UV measurements. Using this best estimate, the yearly sums of erythemally weighted UV irradiance showed a range of 300 kJ/m2 at 67°N to 750 kJ/m2 at 40°N. The year-to-year variability was lowest at 40°N with a relative variation of 4.3%; for central and northern European latitudes this year-to-year variation was 5.2 to 6.5%. With regard to the period 1980 to 2006, first-order trend lines range from 0.3 ± 0.1 to 0.6 ± 0.2% per year, approximately two thirds of which can be attributed to the diminishing of cloudiness and one third to ozone decline.
 UV is usually characterized by irradiance measurements, i.e., the net energy stream per wavelength per unit time through a horizontal unit area. Action spectra are introduced to weight the relative effectiveness of each wavelength to induce (biological/medical) effect. The well-known UV index is the spectral irradiance (W/m2/nm) weighted with the CIE erythemal action spectrum [McKinley and Diffey, 1987] multiplied by 40 to arrive at more accessible figure for the general public [World Health Organization, 2002].
 Assessments of UV-induced effects call for studies over prolonged periods of time. First, because the adverse health and environmental effects often relate to contracted exposure from years to a lifetime. Second, because time scales of important atmospheric processes involved, e.g., ozone depletion and recovery, are beyond decades. Since monitoring of UV, of mainly irradiance, started in the 1990s, reliable UV reconstruction methods are essential to make any judgment on, for instance, the impact of ozone depletion on health and the environment through altering of the UV exposure. Thus far, risk assessments are based on modeled UV [Slaper et al., 1996].
 Clear sky UV irradiance can be modeled from first principles. This is because processes like molecular extinction and effects from surface reflectivity are well known, or sufficiently determinable by direct measurements. Clouds however, evoke too complex a radiation field to make its modeling a feasible approach. Only recently have 3-D Monte Carlo simulation and 3-D radiative transfer models started to be used [Cahalan et al., 2005], but employment of these models requires more information on the cloud field, information which is often not available.
 Fortunately, UV exposure induced effects are related to time-integrated values. Therefore, reconstruction methods have concentrated on effective UV integrated over a day as a starting point [Bordewijk et al., 1995; McArthur et al., 1999; den Outer et al., 2000; Kaurola et al., 2000; Lindfors et al., 2003], although several studies engage with smaller time scales [e.g., Bodeker and McKenzie, 1996; Staiger et al., 2008]. Most successful, in the sense of broadly applicable, accuracy, and small computational times, are UV reconstruction methods that implement global solar irradiance (GR) as a cloud impact proxy. Commonly, a Cloud Modification Factor (CMF) is derived from the GR measurements (CMFGR) and transferred to a CMF applicable in the ultraviolet range. Global solar irradiation is measured with pyranometers at meteorological stations across the world and quite often time series range backward in time beyond the 1960s.
 Cloudiness and season induce a large dynamical range in the daily UV sums. Assessing the overall quality of reconstruction techniques, therefore, requires different quantities that need to be tested. Koepke et al.  reported on a comparison of 18 reconstruction methods applied to four locations, using 2 years of data. A ranking was introduced addressing the absolute and relative difference and standard deviations in measured to modeled daily sums, declaring the measurements as the true value.
 So far, no effort has been made to combine results of different reconstruction methods and to use the mutual agreement and that with the measurements to determine the uncertainty in a produced ultimate reconstruction. Moreover, because the accumulation of UV exposure over long periods is important for the induced effects, comparisons of modeled and measured UV are necessary not only of daily but also of monthly and annual integrated values. The construction of these integrated values, however, is hampered by gaps in data records. The optimization of bridging these gaps and the resulting reduction of the intrinsic uncertainty, contributes to a better overall reconstruction.
 In this paper, we will discuss the performance of five different reconstruction methods, hereafter referred to as models. The models have been run to reconstruct daily sums of erythemally weighted UV irradiance for eight sites in Europe. We will test their performance by comparison the models to each other and by comparing the performance to UV measurements. We will infer a best possible description of the past UV climate at these eight sites, accounting for the established quality of the individual models and measurements. Initially, to train and test the models, the period 2000–2004 was chosen, but later on, to serve the ultimate goal of reconstructing as accurately as possible, this stringent condition was relaxed. In section 2, we will discuss briefly each UV monitoring site and the available data, and the main features of the participating models. The first results section, section 3, deals with the comparison of modeled and measured daily UV. How a merging of both can be achieved to arrive at an overall best description of the current and past UV climate is discussed in section 4. Results using these merged data sets are presented in section 5, as well as identification of different contributions to long-term changes. Within the context of this study, a uniform supplementation algorithm has been set up bridging occurring data gaps. The algorithm, explained in the appendix, makes reliable construction of monthly and yearly integrated values possible.
2. Description of Measurements and Models
2.1. UV Measurements
 In this study, we use data from eight low-altitude UV monitoring sites across Europe. All sites were selected in the SCOUT-O3 EC project because of the availability of long-term quality controlled UV data, and all of these data sets were reevaluated in the context of the same project. The spectra are available at the European UV database, (http://uv.fmi.fi/uvdb) where, at submission, each spectrum undergoes an automatic quality flagging using the SHICrivm packages [Slaper et al., 1995; Slaper 2002] and the CheckUVspec package NILU, Norway (http://zardoz.nilu.no/∼olaeng/CheckUVSpec/CheckUVSpec.html).
 Daily sums of erythemally weighted UV irradiances were delivered by each site operator. UV spectra are exclusively used to produce the daily sums for Bilthoven, Jokioinen, Sodankylä and for Lindenberg after 2004. The Potsdam data set and the first part of the Lindenberg data set are based on Brewer spectra. For these sets, GR data averaged over the time of a Brewer scans were used to reduce the additional uncertainties in daily sums that could arise due to the small number of 1 to 2 spectra per hour. At Hradec Kralove a low number of spectral measurements per day, 4 to 14 typically, are combined with broadband UV measurement. For Thessaloniki, broadband data is calibrated against collocated Brewer spectroradiometers. The Norrköping UV data set is fully broadband based, however, this data series was previously extensively reevaluated, and relative spectral correction functions were applied. The relative spectral correction functions were defined by comparing broadband measurements with collocated spectral Brewer measurements.
 The main characteristics of each monitoring site, and references to operation protocols and quality control, are listed in Table 1. The measurement regime for all sites is from sunrise to sunset, with some variations in choice of tolerance. The listed uncertainty (Table 1) of the delivered daily sums reflects the relative uncertainty: the part that is due to processing and corrections applied to individual spectral or broadband measurements. This uncertainty is reduced where monthly and yearly sums are considered. Following the detailed analyses of absolute calibration procedures and to the long-term stability of the instruments as given in the references of Table 1, the uncertainty in the absolute calibration can be established at ∼4%, and the long-term stability at ∼2%.
 The reconstruction models are identified with the acronym of the institute where they have been developed. DWD and CHMI are based on neural networks, FMI uses solely radiative transfer modeling, AUTH and RIVM use radiative transfer modeling with empirical relationships. The vast majority of model input data is on a daily basis or concerns daily sums of radiation, otherwise it is available as a single value or as 12 month climatology. All models use the site location and ground-based ozone measurements, and GR measurements to determine the cloud impact on UV. Additional ancillary measurements, not necessarily available at each site, include: cloud cover, water vapor column, aerosol optical depth, visibility, snow depth (proxy for surface albedo), surface albedo measurements, and sunshine duration. No direction is provided on how to use additional ancillary data, if at all. Moreover, not all models were set up to incorporate all ancillary data. The performance of a model will not automatically improve when more ancillary data is taken into account. A graphical overview of available data and reconstructed UV is shown in Figure 1. The use of ancillary data is summarized in Table 2. For non-snow-covered landscapes, a small surface albedo, ∼3%, was generally applied for UV, and 8 to 12% for GR. The surface albedo of snow covered landscapes is estimated from snow depth data by each modeler, or from ground albedo measurements of GR, as is the case at the Potsdam site. References to a more elaborate description of each model are listed also in Table 2.
 The neural networks were trained with local ozone and daily sums of UV and GR and possibly other ancillary data. Hence, a site-dependent version of the model is built. DWD incorporated all available ancillary data for the final reconstruction, but the training period remained confined to the years 2000–2004. By contrast, CHMI did not implement the additional ancillary data but used all available UV measurements for training. AUTH derived separately for each site the relationship between the CMFs for GR and UV, using the period 2000–2004. RIVM applied for all sites a relationship between the CMFs for daily sums of GR and UV that was initially established at Bilthoven for the years 1999–2002. The FMI algorithm uses constructed look-up tables, which were previously calculated for an atmosphere with various homogeneous cloud layers for the whole spectral range, thereby linking the reduction of GR to that of UV.
3. Direct Comparison With Measurements
 The results discussed in this section are derived exclusively from actual measured UV data, and fully modeled daily sums based on at least actual measured ozone and GR. Furthermore, we consider only the subset of concurrent days: a day is included only if all models have produced data for that day and the measured daily sum exist. In this way, a genuine comparison is made because the influence of supplementation algorithms is ruled out and the different models are compared to the same subset of measured UV data. For the same reason, the analysis of monthly and yearly sums is deferred until we have set up a uniform data supplementation scheme for the individual data sets, see the appendix. We do compare, however, constructed totals of concurrently available daily sums within a month and within a year.
 A selection of graphs, shown in Figure 2, gives the daily sum ratios, modeled daily sums divided by measured daily sums, as a function of the day number. Two sites are shown for each model. The spread of data points throughout the year is generally small and distributed evenly over the whole year with some seasonal variations, more conspicuous for some sites (e.g., FIJ, DEP, and CZH). We summarize the statistics in Table 3, results based on clear sky days (CMFGR > 0.95) are listed in Table 4. Owing to the applied selection criterion, Table 4 also includes days with fractional cloudiness during parts of the day, as it is very well possible that such a day has a comparable daily sum with a truly cloudless day.
Table 3. Averaged Ratios Days Sum Measured/Modeleda
FIS, N = 3144
FIJ, N = 3140
SEN, N = 7494
DEP, N = 2562
DEL, N = 2462
NLB, N = 3775
CRH, N = 3775
GRT, N = 3929
N is number of days included. Best estimate (BE) will be explained in section 4.
 In Figure 3, we give a graphical summary of the results shown in Table 3 and 4; averages and indicated error bars have been calculated with respect to the eight sites. Ratios of month and year totals are plotted as well. The daily sums, on average, are overestimated by all models, specially ‘clear sky’ by AUTH and ‘all days’ by RIVM. The agreement improves considerably for month and year totals, the averaged year totals are within 1.2% for all models. The large error bars in Figure 3 of FMI, result from rather large deviations in absolute sense. At the same time, FMI ranks best for its produced standard deviations in the daily sum ratios per site.
 In Figure 4, we show the average ratios per model as a function of CMFGR bins. The data of the eight sites are processed together to derive this graph. Although DWD, CHMI, and RIVM have a CMF-independent model performance for some sites, in particular their “home” site, the general picture emerges of a progressive overestimation of the actual UV sums at increasing cloudiness. Fortunately, those days contribute little to the contracted total sum. The AUTH model shows an area close to unity halfway between solid overcast and cloud-free situations and progressively overestimates moving in both directions. The FMI model turns out to have the most CMFGR-independent model performance when all sites are considered together.
 Unsurprisingly, summers are described better by the models than winters. This is also reflected in the fact that modeled daily sums are 3 to 5% larger than measured, while year totals are close to the measured values or even smaller. An obvious reason is the absence of snow during summers. Snow cover increases the complexity of the radiation field and requires an estimation of the ground albedo. Additionally, in summers, a major contribution to the daily UV sums comes from small solar zenith angles (SZAs). At small SZAs, the contribution of the direct UV irradiance is high compared to the diffuse irradiance, and the direct component is simulated more accurately [van Weele et al., 2000]. For similar reasons, measurements during summer are more certain as well. Additionally, detectors are calibrated at normal incidence, corresponding to an SZA = 0. Hence, any deviation from the ideal response, either with respect to angle or wavelength, will emerge at the larger SZAs.
4. Best Estimate
 The overall conclusion of section 3 is that none of the models stands out as the best performer for all sites and all different ways of result assessments. In addition to this conclusions, we must acknowledge that UV measurements are only ever a representation. Many factors influence the accuracy of the measurements not all of which can be fully accounted for [e.g., Bais et al., 2001; Webb et al., 2003]. To describe the past UV climate as accurately and as far back in time as possible, we have set up an algorithm to infer the overall best description, i.e., the best estimate (BE) of the present and past UV irradiances. It makes use of all models and measurements and their mutual divergences to determine the accompanying uncertainty. The BE delivers an overall consistent reconstruction dealing with the fact that for the same site start and end years in data series differ from model to model and from model to UV measurements, and that data gaps are distributed quite randomly over these series (gaps extend from a single day to months). The BE can also be used to supplement the individual modeled and measured data sets to a 100% coverage in a consistent way. This will be explained in the appendix.
4.1. Construction of the Best Estimate
 The principle of the BE algorithm, based on an idea exploited in UV intercomparison campaigns [Slaper and Koskela, 1997], is applied separately to each site. In essence, it is a weighted average of scaled modeled daily sums of UV irradiances. The multiplication factors used for the scaling and the weighting factors are inferred from the procedure described below. Initially, the algorithm is set up using the models exclusively allowing a genuine BE versus measurement comparison. In the end, the measurements enter as the results of a sixth independent model completing the merger of models and measurements.
 Multiplication factors are assigned to arrays where, by definition, one array contains the daily sums produced by one model for 1 year. Because each model is run for a certain time period, a time series of multiplication factors is distributed to each model. Averaging these multiplication factors yields the scaling factor for that model and the accompanying standard deviation yields the weight of that model. These multiplication factors are termed median multiplication factors (MMFs), as they are determined in such a way that the averaged position of daily sums of one array is median with respect to all daily sums of a particular year, when this array is multiplied with its MMF. The process is run successively for all existing arrays yielding the time series of the MMFs for each model.
 In Figure 5, these MMFs are plotted for all sites clearly showing the different behavior of the individual models and sites. It reveals the underlying mutual agreement of the models (i.e., the spread of the MMFs in 1 year) but also the year-to-year stability of each individual model.
 In formulae the BE is given by
with being the average of the scaling factors MMFi(y) over all the years for model i, ΔMMFi the standard deviation, and Ii0 (y.d) is the reconstructed irradiance at year y and day number d. The standard deviation delivers the weight for model i:
where the unweighted average over all models, is used in the weights and in the error in the BE as a simplification.
 Only the subset of concurrent days within the arrays are used for the MMF calculation; that is, daily sums should have been based on the actual measured ancillary data (at minimum ozone and GR), and daily sums should have been produced by all other participating models. Summations in equation (1) however, run over all modeled days, and the requirements of concurrent days and the exclusion of supplemented data are removed.
 In Figure 5, the MMFs are plotted before any filtering. Clearly anomalous results can be noted. An anomaly could stem from a model that fails to follow extreme weather conditions for that particular year, or rather trivially, as in the case of the first points for Jokioinen and Sodankylä and 2002 for Potsdam, from arrays containing too few days to derive a meaningful MMF (due to the concurrent criterion). These MMFs should not influence the overall scaling of the model and reduce the weight of the model in the final averaging. The daily sums within this array, however, should be allowed to be included in the final result and maybe increase the uncertainty of the reconstruction in the case of a less adequate model. An MMF is a marked outlier when it deviates more than 8% from the averaged MMFs over all years (same model). To reduce the occurrence of trivial outliers, a requirement of at least 120 days per array is set.
 Outlier daily sums occur as well and show up as scattered values. They easily exceed a factor of 4 compared to the other values for these same days (corresponding measurements or obtained from an intermediate BE). Outlier daily sums have a large impact on the BE for those particular days, since the weight in the averaging, through equation (1), is just the overall weight of the model derived from its MMFs. Therefore, daily sums exceeding an intermediate BE by a factor of 2 are excluded from further usage.
 The preset limits for filtering of the MMFs and daily sums are a compromise between exclusion of erroneous results and retaining enough, and the correct data for the final result. Of course, these quantities are not known a priori. An analysis showed that the final result is not too sensitive to the exact limits, but, as set, they prevent obvious deviating results when for instance the BE is compared with measurements. After filtering, the whole process is repeated to yield the final BE.
4.3. Error in the Best Estimate
 The error in the BE as calculated in equation (1) depicts more the uncertainty in the BE given these particular model results, and less the overall uncertainty of the estimated UV sums. A better indication of the uncertainty is achieved by taking out one model at a time, and recalculating the BE each time this is done. The maximum and minimum values found, including the added uncertainty by equation (1), are more realistic boundaries of the ultimate uncertainty of the final BE.
4.4. Comparing the BE With Measurements
 We had listed in the Tables 3 and 4, and plotted in Figures 3 and 4 also the results of the BE. Clearly, the BE has the best overall performance, although in some situation individual models may perform better. The BE has the following ranges of −2 to +6%, −3 to 4% and −3 to 1% for the deviation of daily sums, month totals, and year totals for individual sites, respectively, while individual models range from −9 to +13% (daily sums), −9 to 6% (month totals) and −7 to +7% (year totals).
 In Table 5, the models have been ranked with respect to the averaged standard deviations in ratios of daily sum and month totals, the ratios of year totals, and the CMFGR dependency. For the latter, the weighted averages of the data points in Figure 4 have been calculated for each model using the CMFGR values as weights. The best result is ranked 1 and the worst 6, the rest is scaled continuously. None of the individual models performed better than the BE on any of the listed quantities, although improvements by introducing the BE may become as small as 2% compared to one particular model, finding improvements of 20 up to 600% is relatively easy. Clearly, the ranking reveals that the BE performs best overall, while the ranking of the individual models alternates and, on average, the models score equally well.
Table 5. Model Ranking With Respect to Ratios of Year Totals, Averaged Standard Deviation in Ratios of Month Totals and of Daily Sum, and CMFGR Dependency
Ranking based on numbers between brackets, i.e., ratios or standard deviations averaged (ave) over all sites.
5. Results on Long-Term UV Reconstruction
 In Figure 6, the ratios of measured to BE daily sum are plotted as a function of time. In this way, long-term trends in atmospheric composition and stability of the instrumentation become visible. Sensitivity loss in either the UV, GR or ozone measurements translates directly into a decrease in the ratios, and overestimation of cloud or aerosol impact on UV leads to ratios that are too high. Bilthoven, Jokioinen and Sodankylä yield the most stable long-term results, indicating internal consistency of UV measurements and ancillary data, and moreover, no detectable trend in the aerosol (optical) properties.
 Although Hradec Kralove yields a long-term stable agreement, a clear seasonal pattern is evident, which might be due to the daily sum construction algorithm for the measured UV combining a low number of spectral measurements with high frequent broadband measurements. A not fully adequate characterization of the angular response, or wavelength response, of the broadband detector may lead to the observed behavior, i.e., long-term stability ensured by well-calibrated spectral measurements but a clear seasonal pattern with underestimations in winter.
 Thessaloniki is a good example of an area with a dramatic change of the aerosols, resulting in a decrease of the aerosol optical depth of 2.9 ± 0.9% per year at 320 nm between 1997 and 2006 [Kazadzis et al., 2007]. In the same interval, we can identify upward trends in the daily sum ratios of 1.4 ± 0.1% per year where models did not include a long-term aerosol change, and 0.3 ± 0.1% per year where they did. In fact, only RIVM incorporated this trend in aerosols. Thus, the behavior of the Thessaloniki data can be explained by long-term aerosol change. A similar feature may be identified for the Norrköping data set where, after a constant level, an increase is found of the ratios of about 0.5 to 0.9% per year depending on the exact boundaries between 1996 and 2003. Attributing this to an aerosol trend requires a rather large trend of 3 to 6% per year, which is not reflected in the aerosol data of Norrköping.
 In Figure 7, the yearly sums are shown as a function of time for the eight locations. The construction of the yearly and monthly sums is explained in the appendix. Each plot contains the measured, the five individual modeled and BE yearly sums. A similar pattern of the year-to-year variability of the measurements is seen with the models, and also the models behave uniformly in this manner. Much more scatter would have emerged in this type of graph if a general supplementation algorithm for the data gaps had not been applied. The overall standard deviation between modeled yearly sums is now around 2%, whereas if individual supplementation algorithms are used, as supplied by each data submitter, this would be 3–4%, (see Appendix A). Note that the uncertainty in the BE is considerably smaller than the range of the estimated yearly sums produced by the individual models, as shown in Figure 7.
 The aerosol trend at Thessaloniki causes some deviating results after 1997. Only RIVM that incorporated this trend, follows the rapid increase of the measured yearly sums. The other models remain on the averaged level. The result is a larger estimated uncertainty for the start of the reconstruction. The clear overestimation prior to 1980 of CHMI for the Potsdam data follows from a moderate reconstruction of the actual measured UV data, the precise reason, however, being unclear.
 The most adequate description of the past UV climate is achieved by making a merger of models and measurements: the measurements enter as a sixth model in the above procedure. The averaged MMF found for the measurements can be rescaled to 1; models are thereby rescaling accordingly. In this way, the calibration of the measurements prevails and the measurements are deemed the actual standard. In real terms, the additional scaling of the models turns out to be 0.5% for FIS(+), FIJ(−), NLB(−) and GRT(−), and to 2.5% for SEN(−), DEP(+), DEL(+) and CRH(−), (plus and minus signs indicate direction of scaling). Results derived on this BE will be discussed now.
 The averaged monthly sums per decade using this BE are shown in Figure 8. We deduce that for the more recent decades most sites have higher monthly sums. However, assessing long-term trends for individual monthly sums did not deliver a clear picture due to the large year-to-year variability of the monthly sums. The variability is around 12%, and reaches its maximum of 15% in March. Thessaloniki stands out for its summer periods with a year-to-year variability for August of only 5%, which is also the reflected in the variability of the yearly sums. The year-to-year variability is 4.3% for Thessaloniki, while most other sites have a 6% variability (defined as the relative standard deviation in the averaged yearly sums).
 In Figure 9, we show the long-term reconstruction of the yearly sums and the accompanying estimated uncertainty. Note the asymmetry in the uncertainty, which originates from the maximum and minimum values found when subsequently omitting the models or measurements in the calculation of the BE. The start year of each reconstruction is set to the availability of GR data and not of total ozone, as it can be argued that ozone depletion had not started yet and hence, the climatological ozone could be used. The absence of ozone data is indicated in the figure, except for Lindenberg where it indicates the absence of GR data. The BE of Lindenberg prior to 1980 is based on the DWD model that uses sunshine duration as a cloud effect proxy for this period. This approach is less accurate: order of 2% uncertainty increase for the bias and 6% increase of the standard deviations for the daily sums [Feister et al., 2008a]. The yearly sums in the most recent decade for all sites are higher than the long-term averages, and a number of years in the 1980s can be identified as having the lowest UV yearly sums at most of the sites. This observation become more clear in Figure 10 where we summarize the reconstruction by presenting the 10 year running weighted mean of all sites together. The uncertainty of each BE data point is used to determine its weight (see equation (2)). Because each reconstruction has different start and end years depending on site, each reconstruction is first normalized to its averaged level of the overlap years 1983–2004. The final running mean, as presented in Figure 10, is normalized again.
 It should still be investigated whether the major contribution to the variations in the yearly sums comes from ozone depletion or from coincidental changes in cloudiness. In Figure 11, we have plotted the deviations of the yearly sums from their long-term average, and indicated the contribution caused by ozone depletion and cloudiness. In all cases, 3 year running means are shown to suppress the short-term variability. The impact of ozone and clouds on the yearly sums has been separated by comparing the modeled sums using the actual ozone data with the sums derived from a climatological ozone record (for each site separately). The RIVM model has been rerun using this climatological ozone, and the as such constructed yearly sums are subtracted from the actual ones. What remains is the variability, or trend, of the yearly sums due to ozone. This is analogous to what has been performed for the Davos time series [Lindfors and Vuilleumier, 2005]. The excess variability is attributed to cloudiness.
 Low UV levels, compared to the long-term average, prevail until the eighties, mainly as a consequence of increased cloudiness. Especially, the 1990s stand out as a decade with high UV levels due to ozone depletion. This is in line with previous findings [e.g., Kerr et al., 1993; Bojkov et al., 1993]. In this period, ozone depletion was enhanced due to volcano eruptions of Mount Pinatubo [Geller and Smyshlyaev, 2002]. The high values around 2003, for a number of sites, are mainly the consequence of cloudiness, as can be read from the plot. The same has been observed for Davos [Lindfors and Vuilleumier, 2005] and Moscow [Chubarova, 2008].
 To summarize, we have calculated the linear component of the yearly sum development, and the relative contributions to this of ozone and clouds. Only the years after 1980 are considered, where 1980 roughly marks the onset of ozone depletion. Results are given in Table 6. All sites show indeed a significant positive trend. Most prominent, however, is the contribution of cloud diminishing rather than ozone. On average, two thirds of the total change can be attributed to clouds, and one third to ozone change. Partly, the observed increases in surface UV are attributed to the cleaning of the atmosphere, starting in the late 1980s [Pinker et al., 2005] which is inline with findings of Wang et al. . The European continent, however as an exception, has undergone a “brightening” possibly due to air pollution abatement policies [e.g., Kazadzis et al., 2007; Ruckstuhl et al., 2008].
Table 6. First-Order Trend Lines and Contributions Derived for 1980–2006
 Results of five UV reconstruction models have been intercompared and compared with measurements for eight locations in Europe. Generally, models agree well with measurements, and none of the models stand out as best. A method has been set up to infer one reconstruction, the so-called best estimate, reflecting the long-term stability and underlying agreement of the models and the agreement with the actual UV measurements. The best estimate delivers a site and assessed quantities (e.g., daily sums, standard deviation) independent performance, and surpasses the performance of individual models. In a final step, the measurements are merged with the models, thereby producing a best estimate reconstruction closest to the actual UV history.
 Year-to-year variability of the derived UV reconstruction is large compared to the trend that can be expected from ozone decline. Still, it can be argued that UV levels have gradually increased over the last three to four decades, and particularly since the eighties. When all sites are considered together, levels are now about 4 to 8% higher than before the 1980s. The lower levels from before the 1980s are mainly a consequence of increased cloudiness compared to the long-term average. Consequences of ozone depletion emerge after the 1980s, especially the years 1990 to 2000 stand out as a decade with high UV levels due to ozone decline. However, for the overall change that can be observed for the period of 1980 to 2006, two thirds can be attributed to a diminishing of cloudiness or aerosol optical thickness, and only one third to ozone decline.
 Despite that yearly sums have removed most of the annual cloud variability, the marked structure in the records of yearly sums at all sites suggest that clouds dominate also a great percentage of the interannual variability. Many of the peaks and dips are repeated at all sites indicating changes either at European scale (e.g., low UV levels in 1991 and high levels in the mid 1990s) of at regional scale (e.g., low UV levels in 1998 and 2001 in northern and western Europe, or 1984 and 1987 in Central Europe). The large interannual variability imposes great uncertainty in attempting to calculate trends in the UV that could be used to infer projections for the coming years. Changes in aerosols and clouds, direct and indirect effects of climate change and the anticipated full recovery of the ozone layer will probably further complicate the picture of UV variations in the future.
Appendix A:: Supplementation of Data Gaps
 Measurements, either directly as UV measurements or through input data for model calculations, are the starting point for the data records used in this study. Due to different reasons, gaps occur in the time series of these data records, about 50 daily UV values per year on average. This seriously hampers the construction and comparison of monthly and yearly sums.
 We set up a supplementation algorithm that makes use of the constructed BE. Data gaps in either measured or individual modeled data records are filled with corresponding values from the BE. Prior to slotting these in, he BE values are scaled. This scaling is determined so that the total sum of the existing data and that of the concurrent days of the (scaled) BE match. Matching is separately applied to each year. After supplementation, the construction of monthly and yearly sums is reduced to just a simple summation over the corresponding days.
 To show the effect of different routes of supplementation, we determine the variation in the individual yearly sums produced by each model in this study. We calculate the average of the standard deviations obtained when averaging the individual modeled results for each yearly sum per year and per site. In this short analysis, years with 100% coverage are left out from the start. We find the following percentages: 3.7 ± 3.7 (1), 2.5 ± 1.5 (2), and 1.7 ± 0.9 (3). Numbers between brackets indicate the followed supplementation route (1) using the data as provided and scaling with the number of available days per year for missing values, (2) slotting in scaled climatology derived from each model separately, and (3) using the method described above. Indeed, the best results are obtained by the latter, i.e., slotting in BE-scaled values. The overall standard deviation in the ratio model to measured year sums were found to be 12.5, 3.5 and 3.2%, following the same three ways of supplementations as indicated above.
 Gaps in the BE occur only when neither of the models or the measurements have produced a value for that day. In this case, a climatology is built from the BE and slotted in. Again, scaling is applied so that the ‘year totals’ of existing BE data and the concurrent days of the climatology match.
 Note that these constructed monthly and yearly sums of the different models or measurements are still independent, as the calibration behind the modeled or measured daily sums prevails. Only the fast components, i.e., the daily sums as a function of time, get correlated as a consequence of slotting in the BE, but this correlation is exactly what is integrated out in the construction of monthly and yearly sums.
 Supplementing the ozone data record prior to calculating the UV, rather than afterward supplementing the modeled UV record, leads to smaller uncertainties in the constructed yearly sums. This is to be expected because the variability of ozone is smaller compared to that of UV. As the latter does not hold for the GR, a clear improvement is not expected when supplementing GR data instead of the modeled UV record. Tests were performed with the Bilthoven data set of Ozone and GR (34 years). At random, 45 daily values per year where taken out, either in the ozone or GR data record. Also, data sets with two randomly located gaps of adjacent days (total length of 45 days) were constructed. The data sets prepared in this way where supplemented using scaled climatological values (similar to route 2) or the UV record was supplemented after the model calculations had been performed. Here also route 2 was followed. The standard deviations with respect to the yearly UV sums based on the original data were determined. Indeed, we find two to four times smaller values when supplementing the ozone data compared to standard deviations found when supplementing the UV record. The uncertainty due to gaps in the GS data record turns out not to depend on which data is supplemented, as expected. Also, we found that randomly distributed gaps have a two time smaller impact than when they are linked to one or two large gaps.
 We would like to acknowledge the following individual and institutes for making available data, used for the reconstruction at various sites: H. De Backer of the Belgian Meteorological Institute, Brussels, Royal Netherlands Meteorological Institute. Part of this work has been performed within the European Commission funded project SCOUT-O3 contract 505390-GOCE-CT-2004.