Quantifying exposure biases in early instrumental land surface air temperature observations

Exposure biases are a pervasive non‐climatic change in land air temperature records which have been introduced as a result of changes in the way thermometers were sheltered from solar radiation and the elements over time. Exposure biases have not been widely accounted for in observational records, due to difficulties detecting/correcting the bias using traditional homogenisation techniques; therefore, exposure biases still contribute significant uncertainty to the early period in global temperature compilations. Here, an empirical approach to address the bias arising from the introduction of Stevenson‐type screens from the late‐19th century is presented. The approach consists of: (1) an empirical analysis of 54 parallel measurement series to identify the characteristics of the exposure bias in four exposure classes; (2) the development of bias‐estimation models based on an analysis of which variables influence the bias; and (3) the application of the models to an extended version of CRUTEM5 (CRUTEM5_ext), based on exposure metadata, to quantify and reduce the bias. Step one identified differences between the temperatures recorded in Stevenson screens and early exposures, which vary seasonally, diurnally, and with location and exposure class. The largest biases (in mean temperatures) were found in freestanding exposures (up to −0.78°C annually) and in summer, while the smallest biases were generally found in wall‐mounted exposures (near‐0°C annually) and in winter. Significant relationships between the bias and temperature, downward top of atmosphere and/or received shortwave downward solar radiation were found in each exposure class and led to the development of three regression‐based bias‐estimation models. Application of these models to 1,960 mid‐latitude stations in CRUTEM5_ext, resulted in small (≤0.016°C) positive adjustments to the Northern Hemisphere mid‐latitude mean before 1880, and larger (≤−0.1°C) negative adjustments to the Northern and Southern Hemisphere mid‐latitude means between 1882–1934 and 1856–1900, respectively. Larger adjustments were estimated regionally: up to −0.57°C annually and −0.79°C seasonally in individual grid cells.

in Lenssen et al. [2019] and Osborn et al. [2021] in Morice et al. [2021]) and are also used to calibrate many temperature palaeoreconstructions (e.g., Anchukaitis et al., 2017;PAGES2k Consortium, 2017).LSAT records can be compromised by non-climatic changes in the data-known as inhomogeneities-which can result from changes in station location or surroundings and from changes to instrumentation or observing practices (Jones, 2016;Trewin, 2010).Inhomogeneities can be large in magnitude, of a similar scale to true climatic responses to forcing at individual stations, therefore affected observational records require correction or consideration before they can be used to study climate variability and change (World Meteorological Organization, 2020).Where inhomogeneities affect individual stations, correction is often possible using homogenisation algorithms (Venema et al., 2012); however, where inhomogeneities-known as biases-systematically affect a large proportion of the observations in a region, traditional methods may be insufficient to identify and correct them.This means that biases potentially still exist in LSAT records and contribute significant uncertainty to global temperature compilations (e.g., HadCRUT5; Morice et al., 2021).
One bias affecting LSAT records is the exposure bias.Prior to the development and widespread adoption of variants of the Stevenson screen (Figure 1a) in the late-19th and early-20th centuries, various (often inadequate) methods were employed to protect thermometers from exposure to solar radiation and the elements (Parker, 1994;Sparks, 1972;Trewin, 2010).These early methods (referred to as exposures) varied regionally and included mounting thermometers on polewardfacing walls, stands, and within various freestanding screens (Figure 1b-e).Each type of exposure influenced temperature readings differently, by altering the influence of solar radiation on the thermometer, thus introducing inhomogeneities into station temperature records when the transition to Stevenson-type screens was made (Parker, 1994).
The impact of differing thermometer exposures on temperature readings has been investigated previously, including by: Chenoweth (1992) in North America; Böhm et al. (2010), Brunet et al. (2006Brunet et al. ( , 2011)), Butler et al. (2005), Gaster (1882), Margary (1924), Marriott (1879) and Nordli et al. (1996Nordli et al. ( , 1997) ) in Europe; Gill (1882) in South Africa; Ashcroft et al. (2022) and Nicholls et al. (1996) in Australia, Awe et al. (2022) in Mauritius and Parker (1994), globally.All present similar findings-significant differences in temperature readings between Stevenson screens and historic exposures, which vary seasonally, diurnally and according to weather conditions and type of exposure.Despite these well-documented differences, and assessments (e.g., Frank et al., 2007;Moberg et al., 2003) presenting evidence of the likely presence of the exposure bias in early temperature observations, relatively few exposure bias-specific corrections have been incorporated into global temperature datasets.In HadCRUT5, for example, only records from Australia (Ashcroft et al., 2012), the Greater Alpine Region (Böhm et al., 2010) 1897); (e) Wild (1891).(Brunet et al., 2006) are known to have been explicitly adjusted to account for exposure biases.The lack of more widespread adjustment is largely due to the fact stations within regions or meteorological networks introduced the Stevenson screen quasi-simultaneously, and often without documentation, making the bias difficult to identify and rendering traditional approaches to homogenisation, such as neighbour comparisons, less effective (Brunet et al., 2011;Jones, 2016;Trewin, 2010;World Meteorological Organization, 2020).Even where a bias is known to be present, determining the appropriate adjustment is problematic due to the seasonal nature of the bias (homogenisation algorithms often operate on annual timescales) and the number of variables which are expected to influence its characteristics (Willett et al., 2014).These factors, combined with a lack of available or accessible metadata, mean a large proportion of long LSAT records likely retain biases related to the introduction of Stevenson-type screens (Trewin, 2010).
As such, it is necessary to account for exposure biases in long observational records, including in global temperature compilations.The HadCRUT5 dataset does this by including uncertainties from 'nonstandard measurement enclosures' in its error model (Morice et al., 2021).The model, developed by Folland et al. (2001) based on work by Parker (1994), generates an ensemble of exposure bias error realizations based on assumptions of a fixed annual 1σ uncertainty of 0.2 C (0.1 C) prior to 1930 (1900), decreasing linearly to 0 C in 1950 (1930), for stations within (outside of) 20 S-20 N (Morice et al., 2012).Current knowledge, however, suggests that this is an oversimplistic representation of the bias.The fixed annual uncertainty does not account for the well-documented seasonal nature of the exposure bias and could lead to inaccurate assessment of season-specific trends.In addition, the HadCRUT5 error model does not account for regional differences in (a) the historic exposures in use prior to the introduction of the Stevenson screen or (b) the timing of the transition to the Stevenson screen.Both factors vary (independently) by region and affect the characteristics and magnitude of the exposure bias (a) and the period of time affected by the bias (b) (Parker, 1994;Sparks, 1972).Since the development of the HadCRUT error model, additional parallel measurements and assessments of the bias have become available, and metadata has become more easily accessible as a result of digitisation efforts (e.g., Allan et al., 2011).This paper therefore aims to address some of the limitations identified above by (a) updating Parker's (1994) assessment of the characteristics of the exposure bias (Section 3); (b) developing models to estimate the magnitude and seasonal nature of the exposure bias (Section 4); and (c) applying the developed models to the stations in an extended version (CRUTEM5_ext) of CRUTEM5 (Osborn et al., 2021), using information about historic exposures compiled from a review of metadata and relevant literature (Section 5).

| DATA
To study the characteristics of the exposure bias, series of parallel measurements-temperatures recorded nearsimultaneously in two or more co-located exposureswere collated from the literature and meteorological yearbooks (Table 1; Figure 2).As this study is concerned specifically with the transition to Stevenson-type screens, only studies detailing temperatures (or differences) for Stevenson-type screens (T SS ) and at least one other historic exposure (T Hist ) were collated.
For each series and exposure, the mean monthly (or seasonal) maximum (T x ), minimum (T n ) and/or mean (T m ) temperature readings were recorded and, where necessary, converted to degrees Celsius.Where not given by the source, and where sufficient data were available, the diurnal temperature range (DTR) and monthly mean temperatures were then calculated.Finally, the difference between the variables: T x , T n , T m and DTR, recorded in the Stevenson screen and the historic exposure were calculated according to: where T is substituted for each of the previously listed variables.Note that for some series, only monthly differences (ΔT) were available, or ΔT plus the readings for one exposure (from which the values for the other exposure were calculated).ΔT x , ΔT n and ΔDTR were recorded, in addition to ΔT m (the variable of relevance to CRUTEM5), to provide a more comprehensive picture of the characteristics of the exposure bias as well as to allow an assessment of the elements which contribute to ΔT m (as, by construction, ΔT m is the mean of ΔT x and ΔT n and cannot have a larger bias than both ΔT x and ΔT n ).For further theoretical discussion of the nature of biases in T x , T n , T m and DTR, and how they relate to one another, the reader is directed to Thorne et al. (2016).
Each parallel measurement series was categorized to allow easier comparison between the main types of historic exposure.The categories used were based on those in Gaster (1882) and are defined as follows: i. Open exposures: freestanding exposures, such as Glaisher (Figure 1b) and Montsouris stands, which, with the exception of protection above and to one side, expose the thermometer fully, or nearly fully, to the air;  28 Moden (1954)  ii.Wall-mounted exposures: any exposure where a thermometer is mounted on a wall (Figure 1c), fence or window, either screened or unscreened; iii.Intermediate exposures: freestanding exposures such as thermometer sheds or summerhouses (Figure 1d), which, in addition to the protection offered by (i), also provide some lateral protection to the thermometer; iv.Closed exposures: freestanding exposures, such as the Wild hut and metallic shield (Figure 1e), which fully enclose the thermometer.

| CHARACTERISTICS OF THE EXPOSURE BIAS
The characteristics of the exposure bias (the seasonal and diurnal structure) were identified for each exposure class by compositing the available ΔT values for each variable.
The key findings are shown in Figures 3-6 and are discussed below.

| Open exposures
Monthly mean ΔT values (Figure 3  These differences are broadly consistent with those outlined by Parker (1994) who concluded that Glaisher stands and Montsouris screens record annual mean temperatures 0.0 C-0.2 C warmer than Stevenson screens, with warmer and slightly cooler monthly mean summer and winter temperatures, respectively.Although the series analysed here and by Parker (1994) have similar characteristics in terms of the direction and seasonal cycle of ΔT, the inclusion of additional studies here highlights regional differences in the magnitude of the bias.Series from Spain and Australia, for example, show larger annual mean ΔT m (range: −0.33 C to −0.78 C) than the UK series (range: 0.11 C to −0.16 C), suggesting location has an influence on the magnitude of the bias.This finding is in agreement with Nicholls et al. (1996) who found larger biases in Adelaide than in the higher latitude series analysed by Parker (1994).

| Wall-mounted exposures
In contrast to open exposures, wall-mounted thermometers tend to record maximum temperatures which are 0.0 C-0.5 C cooler annually than those recorded in Stevenson screens (Figure 4).This is likely primarily due to the more shaded position of wall-mounted exposures, which are protected from solar radiation by the poleward facing wall, in comparison to Stevenson screens which are exposed in the open and receive solar radiation year-round (Omond, 1906).The thermal lag of the wall, and the greater height of some wall-mounted exposures, may also contribute to the difference (Chandler, 1964).There is evidence of a biannual seasonal cycle in ΔT x , with the majority of series showing a 'double-peak' of larger values in early spring and autumn.This seasonal variation can be explained by the strength and angle of the incoming solar radiation (Omond, 1906).During spring and autumn, insolation is reasonably strong, but due to the angle of insolation mostly influences the Stevenson screen.This leads to larger values of ΔT x in comparison to the rest of the year when solar radiation is able to influence both exposures (summer) or does not have a large influence on either (winter).Two series-Gill (1882) and Omond (1906; Aberdeen)-do not have the same biannual cycle in ΔT x ; this may be due to differences in their exposure which mean they are protected from solar radiation year-round.
Minimum temperatures in wall-mounted exposures tend to be warmer than in Stevenson screens, with no obvious seasonal cycle present in the differences.The warm bias is likely due to the thermal capacity of the walls which retain heat during the day, and release it as longwave radiation at night, keeping minimum temperatures warmer in wall-mounted exposures than Stevenson screens.The average difference across all series is −0.44 C annually, however there is significant variation between series, with the smallest mean annual differences close to 0 C and the largest up to −1.42 C.This variation is likely the result of site-specific differences between wall-mounted exposures, including buildingtype, thermometer orientation and height.
The cooler maximum and warmer minimum temperatures lead to a reduced DTR in wall-mounted exposures in comparison to Stevenson screens.On average, ΔDTR is 1.26 C annually, however there is significant variation between series with mean annual differences ranging from 0.68 C to 3.2 C, largely as a result of the variation in ΔT n .The opposite signs but similar magnitude of ΔT x and ΔT n result in little difference in annual mean temperatures between the wall-mounted exposures and the Stevenson screens (−0.02C on average); however, larger differences are present in individual series (range: −0.58 C to 0.26 C) and on monthly timescales (up to ±1 C).The biannual seasonal cycle present in ΔT x is apparent in the overall mean values for both ΔDTR and ΔT m , albeit with reduced amplitude in ΔT m and less well-defined in ΔDTR for some individual series.
These findings are consistent with Parker (1994) who concluded that mean temperatures in wall-mounted exposures do not consistently differ substantially from Stevenson screens, but that some larger differences are likely dependent on site-specific factors.The biannual cycle discussed above, and clearly evident in Figure 4, is only present in one of the series analysed by Parker (1994).The presence of a biannual cycle in the majority of the series analysed here, and the proposed physical explanation for it, suggests that it is a common feature of the bias in wallmounted exposures, rather than an isolated occurrence.

| Intermediate exposures
The comparison between intermediate exposures and Stevenson screens (Figure 5) also shows differences between  1882) and Adelaide Observatory-with larger differences in summer (up to −0.44 C), and smaller negative or positive differences in winter (up to 0.17 C).The same seasonal cycle is not obvious in Field (1920) or Marriott (1894), perhaps because they are based on only 1 year of data and are therefore noisier.
Unlike open and wall-mounted exposures, the direction of the bias in intermediate exposures is the same for both T x and T n , with minimum temperatures also warmer in intermediate exposures than in Stevenson screens, on average by 0.23 C annually, but with significant variation within and between the series.The tendency for warmer minima is thought to be the result of the thermal properties of the intermediate exposures' (usually tiled or thatched) roof structures, which retain heat during the day and radiate it at night, preventing or slowing the cooling of the thermometer (Parker, 1994).The greater variation, and occasional opposite sign of ΔT n may arise from varying weather conditions.In cloudy conditions, for example, intermediate exposures will not absorb as much heat during the day, reducing the warming effect overnight.Without this effect, the thermometer is likely to cool more quickly overnight than thermometers in more enclosed Stevenson screens, leading to cooler minima in some conditions.The series do not show a consistent seasonal cycle in ΔT n ; three series-Adelaide Observatory, Gaster (1882) and Marriott (1894)-show a slightly larger bias in the summer and autumn, whereas Field (1920) shows a more pronounced, but reverse, seasonal cycle with larger differences in winter (−0.7 C) and smaller in summer (−0.2 C).The larger biases in summer and autumn may be the result of stronger solar radiation in those seasons leading to increased daytime heat retention, whereas the reverse seasonal cycle in the latter may be linked to increased cloud cover in summer over the Indian subcontinent (Sen Roy et al., 2015).
The warm biases in both T x and T n mean ΔDTR is more muted in intermediate exposures than for open and wall-mounted exposures.In three of the series, the mean annual DTR is marginally smaller in the intermediate exposure than in the Stevenson screen (by 0.08 C-0.1 C), and in the fourth (Adelaide Observatory) it is slightly larger (by 0.19 C).Monthly deviations are larger (ΔDTR varies between −0.34 C and 0.45 C) and show a seasonal cycle with small differences in winter and larger negative differences in summer.Within series there is significant variation between individual monthly ΔDTR (−1.05 C to 0.78 C) due mostly to the variation in ΔT n .
The warmer maximum and minimum temperatures in intermediate exposures also lead to warmer mean temperatures than in Stevenson screens.On average, annual ΔT m is −0.22 C but varies between −0.43 C (Field, 1920) and −0.15 C (Adelaide Observatory).There is some evidence of a seasonal cycle in ΔT m in Gaster ( 1882) and Adelaide Observatory, with near-0 C differences in winter and larger mean differences in summer (up to −0.38 C in Gaster ( 1882)).There is little evidence of a seasonal cycle in Marriott (1894) (likely due to noise due to a single year of data) and some evidence of the inverse seasonal cycle in Field (1920), with larger mean monthly differences of up to −0.55 C in winter, driven largely by the seasonal cycle in ΔT n .
These results differ slightly with those in Parker (1994) who only analysed the results of Field (1920) and annual mean values from a study in Sri Lanka (Bamford, 1928): both tropical series.As a result, Parker (1994) concluded that annual mean differences between F I G U R E 5 As Figure 3, but for differences between Stevenson screens and intermediate exposures.
the exposures were larger (0.4 C) than found here (0.22 C) and had a weak seasonal cycle in the opposite direction to two of the series analysed here.

| Closed exposures
Maximum temperatures in closed exposures are consistently warmer than in Stevenson screens (Figure 6), though with large inter-site variation in magnitude.Annual mean ΔT x varies between −0.16 C and −0.95 C, with an overall mean difference of −0.43 C. A seasonal cycle of smaller negative ΔT x in winter and more negative ΔT x in summer is present in the overall mean values, driven mostly by Muller (1984), with less pronounced seasonal variation in the remaining studies.As the closed exposures plotted here are all Wild huts (Figure 1e), it is likely the warmer maxima are due to the inner metal shield limiting air flow around the thermometer or becoming heated by indirect radiation.The former theory is supported by Wild (1887) who found ventilation reduces daytime overheating.
There is no consensus between the series regarding the bias in minimum temperatures-three series observed warmer minimum temperatures in the closed exposure than the Stevenson screen, and two cooler.Warmer minimum temperatures may be explained by the larger thermal mass of the Wild hut (Figure 1e) cooling more slowly than the Stevenson screen, whereas cooler minima may occur if there is less radiative heating of the Wild hut in the day, followed by more rapid cooling overnight, via the open (poleward-facing) side of the hut, compared to the more enclosed Stevenson screen (Auchmann & Brönnimann, 2012).In all cases, however, annual and monthly differences are relatively small, not more than ±0.34 C and ±0.41 C, respectively (with the exception of the two larger monthly differences in Gorczynski (1910)).ΔT n does not have a clear seasonal cycle in any of the series analysed, with the possible exception of Whipple (1883) which shows slightly smaller differences in summer than in winter.
ΔDTR shows more consistency between series.Annually, the majority of closed exposures show a larger DTR than the Stevenson screen-on average 0.52 C larger.Two series have slightly smaller DTRs, however, both are based on ≤12 months of data and are potentially skewed by missing summer values (Sprung (1890)) and/or large potential outliers in ΔT n (Gorczynski (1910)).ΔT m also shows greater consistency between the series, with mean temperatures in closed exposures 0.2 C-0.5 C warmer than in the Stevenson screen in four of the exposures, and 0.09 C cooler in one: Whipple (1883).As with ΔT x there does appear to be a seasonal cycle present in ΔT m , with larger differences between the two exposures present in the summer months (up to −0.7 C) than in winter when differences are closer to zero.Again, the seasonal cycle is most pronounced in Muller (1984).
These findings are similar to those in Parker (1994) who found annual mean temperatures in Wild huts to be 0.1 C-0.2 C warmer than in Stevenson screens, with larger differences in summer than winter and an enhanced DTR.The magnitudes of ΔT m found by Parker (1994) are at the lower end of the range found here (0.2 C-0.5 C), again highlighting variability in the magnitude of the exposure bias between series/locations.

| MODELLING THE EXPOSURE BIAS
The findings detailed in Section 3 reinforce the need to account for the exposure bias.In this section, the development of models to estimate the exposure bias in T m , for each exposure class, is outlined.
To develop models to estimate the exposure bias, an understanding of which variables influence the bias in each class of historic exposure is required.This understanding was developed by examining the relationship(s) F I G U R E 6 As Figure 3, but for differences between Stevenson screens and closed exposures.
between ΔT m and three potential explanatory variables, using Pearson's correlation coefficient (r) and robust regression analysis.Robust regression was chosen to reduce the influence of possible outliers or atypical observations of ΔT m on the model fit and the focus is on ΔT m because it is the variable relevant to CRUTEM5_ext.
The three potential explanatory variables considered were: downward top of atmosphere solar radiation (TOA), shortwave downward solar radiation received at the Earth's surface (SWD) and the absolute temperature recorded in the historic exposure (T Hist ).The former two were chosen because the exposure bias, in its simplest form, stems from differences in the quantities of solar radiation which are able to influence the thermometer in each exposure; the latter was chosen due to the link between temperature and longwave radiation and because of suggestions in the literature of a relationship between temperature and the magnitude of the bias (e.g., Ashcroft et al., 2022;Margary, 1924).As the intention is to use any identified relationship(s) to estimate monthly mean exposure biases in CRUTEM5_ext, the explanatory variables also have to be available, or calculable, for all series.
Monthly mean values of ΔT m were sourced from the parallel measurement series detailed in Table 1.Not all series listed were included.Series outside 30 to 60 latitude were excluded because additional factors such as snow cover become more important at higher latitudes and because too few tropical series were available for analysis.Series where T m was not calculated using Þwere also excluded because the exposure bias is sensitive to the method used to calculate daily-mean temperature (Böhm et al., 2010).Where more than 12 months of data were available for an individual series, the multi-year mean for each calendar month bias was used to avoid weighting any relationship toward the parallel measurements with the longest time series.Duplicate series were also excluded for similar reasons.Details of the excluded series, and reasoning, are given in Table 1.
For each included series, corresponding monthly mean values of TOA were calculated using the Python package 'climlab' and of SWD were sourced from a 30-year climatology (1981-2010) of WFDE5, the biasadjusted version of ERA5 (Cucchi et al., 2020).Ideally SWD would have been sourced from observations (or derived from cloud cover) for each series location and month, however, neither were available for each parallel measurement series, nor more generally for stations within CRUTEM5_ext.The use instead of a modern climatology was deemed sufficient to capture the large spatio-temporal variations in SWD which may explain differences in exposure biases between sites or seasons.Monthly mean values of T Hist were obtained or calculated from the series source; where this was not possible, 'surrogate' values of T Hist were obtained to maximize the number of series available for analysis (see Table 1 for details).Where 'surrogate' values were used, care was taken to ensure they did not significantly alter the results of the analyses.
Where a significant relationship between the bias and an explanatory variable was identified, simple statistical models were developed, using the variable as a predictor, and applied to the data to determine whether the relationship could be used to estimate ΔT m .Model performance was assessed by comparing the observed and estimated monthly ΔT m for each input series.Key performance indicators included the root-mean-square error (RMSE), skill score versus no adjustment (where 1.0 indicates perfect skill): and the closeness of the observed and estimated annual mean biases.Due to the limited number of series available as input, data were not held back from the initial analyses for validation purposes.However, data were held back during the assessment of the final statistical models-using a leave-one-out approach-to ensure the bias estimates (and the relationship between the bias and the selected predictor) were robust to the choice of input data.
Of the three explanatory variables, T Hist and SWD produced the best estimates of ΔT m when used as predictors in simple linear regression models; however, neither was able to sufficiently capture both the timing of the annual cycle and the magnitude of the bias.T Hist skilfully captured the magnitude of the bias but not the timing of the peak in the annual cycle, whereas the opposite was true for SWD.This makes physical sense as the annual cycle of ΔT m is likely to be primarily controlled by the amplitude and strength of received solar radiation, whereas the magnitude of the bias is more likely to be dependent on a combination of local climatic factors which are better captured by T Hist .
To exploit the strengths of each, the variables were combined to form a model which uses SWD to estimate the shape of the seasonal cycle of the bias, and T Hist to estimate its magnitude and amplitude.This was achieved by normalizing the SWD seasonal cycle at location i (using its minimum and maximum values) and then scaling it to fit the minimum ( d minΔT m ) and maximum ( d maxΔT m ) bias estimated for i using T Hist : where d ΔT m m, i ð Þ is the estimate of the exposure bias for month m at location i, S is inverse SWD, and d maxΔT m and d minΔT m define the magnitude and amplitude of the bias and are estimated by regression on annual mean T Hist (T a ): Table 2 and Figure 7a show the details of these regressions.
Applying Equations 3 to 5 results in superior estimations of ΔT m in comparison to using the predictors in isolation.A comparison between the observations and estimates reveals the combination of predictors skilfully captures both the magnitude (Table 3; Figure 7b) and seasonal cycle of the bias (Figure 7c).This is particularly evident in Figure 7c which shows the close agreement between the observations and estimates for the AEMET (La Coruna) series.
The deviations between the observed and estimated monthly and annual mean biases (Figure 7b) are generally small.Except for the series from Gill (1882), which behaves differently, the RMSE remains below 0.25 C in all series assessed, and the differences between the annual mean biases are generally below 0.2 C.Where the deviations from the observed annual and monthly mean biases are largest, the modelled value tends to be an underestimate, but in all cases the observed values are captured within the 95% confidence interval.The skill score is also positive in all assessed series (again, except Gill, 1882), and more than 0.9 in 50% of them.These statistics suggest the model can provide skilful estimations of the exposure bias in open exposures, and, if applied to observations, would reduce the bias associated with the transition from open exposures to Stevenson-type screens.

| Wall-mounted exposures
Our physically-based reasoning (Section 3.2) suggests the bias in wall-mounted exposures may be related to solar radiation, however, the linear regression analyses (Table 2) show only weak (r ≤ 0.26) positive relationships (with T Hist and SWD) or no significant relationship (with TOA).Scatter plots plus the indication (Section 3.2) of a biannual cycle in ΔT m suggest a quadratic relationship may be more appropriate than linear regression.
Quadratic regression (Table 2; Figure 8a) shows a significant relationship between ΔT m and TOA, but not with T Hist or SWD which appear to be more affected by outliers.The relationship suggests increasing TOA leads to increasing values of ΔT m until a threshold, after which the relationship becomes negative, leading to biases of a similar magnitude for both high and low levels of TOA.This relationship is consistent with Section 3.2 which found the largest values of ΔT m in spring and autumn (when TOA is mid-strength but only influences the Stevenson screen), and smaller values in summer and winter (when TOA is strong but affects both screens, or is weak and has little effect on either).Analysis of the individual parallel measurement series confirms similar relationships between ΔT m and TOA, with the same timing of the peak, are present in all-but-one of the input series, giving confidence that the relationship is robust despite the wide scatter between sites evident in Figure 8a.The series which did not have a significant relationship (p = 0.79) and which behaved markedly differently (Gill, 1882), was excluded from the final calculation of the regression coefficients due to concerns about its validity.The final model is: Applying Equation 6 to the input series, and comparing the results with the observations, reveals TOA is able to skilfully estimate the shape of the annual cycle of the exposure bias in wall-mounted exposures, but that the magnitude of the estimates and their annual means can deviate from the observations.This is illustrated in Figure 8c which shows the correct timing of the 'doublepeak' of the bias, but an overestimated amplitude, and Figure 8b which shows greater variance in the observations than the estimates.This is not unexpected.As noted in Section 3.2, there is significant variation in the magnitude of the bias between wall-mounted series, due to the wide variety of exposures which fall into the category and the number of factors which influence the bias in addition to solar radiation.As a result, estimates based on one variable will capture only a small part of the variance observed in the biases arising from wall-mounted exposures.
Despite this, the deviations between the observed and estimated values tend to be relatively small-the majority (all) of the series analysed had an RMSE below 0.2 C (0.37 C)-and the greater variation in the input data is captured by the size of the confidence intervals (Figure 8a).In addition, although lower than for the open model, the skill scores are positive in all-but-one series analysed (Table 3), meaning the model still provides estimates that, when applied, can reduce the size of the exposure bias and are thus better than ignoring the bias in most cases.

| Intermediate exposures
As with open exposures, ΔT m in intermediate exposures displays strong, negative correlations (r > −0.5; p < 0.05) with all three variables (Table 2), with increases in T Hist , TOA and SWD all leading to larger warm biases (more negative ΔT m ).This is consistent with the findings in Section 3.3 of larger biases in the summer months in the extratropical series, as well as with our understanding of the processes which contribute to the bias.Intermediate exposures record warmer mean temperatures relative to the Stevenson screen due to the influence of reflected shortwave radiation during the day (controlled by SWD/ TOA) and the emittance of longwave radiation from the roof structure at night (which is influenced by T Hist and linked to daytime heat retention which varies according to SWD and TOA).The weaker correlation between T Hist and ΔT m (r = −0.52), in comparison to ΔT m and TOA T A B L E 2 Results of the robust linear regression analyses for each class of exposure and explanatory variable.Results including the series from Gill (1882) as input.Final regression coefficients for the model, with the series from Gill (1882) excluded, are given in Equation 6.
(r = −0.66)and SWD (r = −0.61), is also consistent with this understanding and suggests the relationship with T Hist may partly be an artefact of the correlation between T Hist and solar radiation.
When used as predictors in simple linear regression models, both TOA and SWD produced similar results, and more skilful estimations than T Hist .Both solar radiation predictors were able to accurately reproduce the seasonal cycle and the magnitude of the biases (Figure 9b,c), with low RMSE scores for TOA (SWD) between 0.07 C and 0.1 C (0.09 C-0.1 C) and reasonably high, positive, skill scores between 0.71 and 0.84 (0.63-0.83).Using each model, the sign of the annual bias is also always correctly estimated, and, in each case, the estimated biases are within 0.16 C of the observed annual biases.
Overall, using TOA as predictor gives marginally better estimates than SWD (Table 3; Figure 9).However, further validation of both models, holding data back, revealed the reproduction of the seasonal cycle was dependent on the series from Gaster (1882) being included as input (Figure 5).As a result, the broad application of either model is not advised due to the limited input data used to fit the model and the overreliance on one series to replicate the seasonal cycle.

| Closed exposures
In keeping with the other freestanding exposures analysed, monthly ΔT m in closed exposures is significantly negatively correlated with all three explanatory variables (Table 2), with increasing T Hist , TOA and SWD corresponding with larger warm biases.As with intermediate exposures, the strongest correlations are with SWD (r = −0.55)and TOA (r = −0.49),and the correlation with T Hist (r = −0.33)slightly weaker.These correlations are consistent with our understanding of the cause(s) of the warm bias in closed exposures, outlined in Section 3.4, as well as with our findings that the bias is largest in the summer.
Overall, the correlation coefficients are slightly lower than for the other freestanding exposures analysed here.This makes physical sense as these exposures are more enclosed.As a result, the influence of solar radiation on ΔT m is comparatively smaller, and other variables, such as wind speed, become more important.This is supported by previous studies (outlined in Parker (1994)) which found the largest biases in closed exposures during clear, calm, weather when received solar radiation is greatest and ventilation is reduced.
Despite the slightly weaker correlations, skilful estimations of the monthly exposure bias were still obtained using one predictor.Of the three variables assessed, SWD produced the more skilful estimations, with reasonable agreement between the observed and estimated values of ΔT m (Table 3; Figure 10b,c) obtained using: As can be seen in Figure 10c, Equation 7 is able to skilfully reproduce the seasonal cycle of the bias, particularly the timing of the peak, as well as capturing the magnitude of the bias to within a few tenths of a degree.With the exception of the series from Whipple (1883), which shows a cool bias in contrast to the other input series, the model has relatively low RMSE values, below 0.24 C for each series, and is able to estimate the annual mean bias to within ±0.2 C of the observed values.There is slightly larger variation between observed and estimated ΔT m at a monthly resolution (Figure 10b), likely for the reasons noted above, however, the greater variance in the observations is accounted for by the confidence intervals (Figure 10a,c).The primarily high, positive, skill scores (Table 3) suggest applying the model would be beneficial for reducing the presence of the exposure bias from closed exposures in most cases.

| QUANTIFYING THE EXPOSURE BIAS IN THE EXTENDED CRUTEMSTATION DATABASE (CRUTEM5_ext)
Three of the models outlined above: the open, wallmounted and closed models, produce skilful estimations of the exposure bias and are considered robust enough for broader application.Here the application of the models to stations in CRUTEM5_ext is outlined and the results discussed.

| Exposure metadata
To apply the bias-estimation models to CRUTEM5_ext, an important new database of historic exposures was compiled for many of the stations in CRUTEM5_ext.Metadata were gathered detailing: (a) if/when a Stevenson screen was introduced, (b) which (if any) historic Relationship between monthly ΔT m and top of atmosphere solar radiation with 95% confidence interval for wallmounted exposures; (b) observed versus estimated monthly (green circles) and annual mean (black crosses) ΔT m ; and (c) observed (grey) and estimated (green) ΔT m with shaded 95% confidence interval for the Fort William series from Omond (1906).Observed (grey) and estimated (green) annual mean biases are given by the dashed lines.Each panel has the same y-axis, so it is only labelled in panel (a).
F I G U R E 9 (a) Relationship between monthly ΔT m and top of atmosphere solar radiation with shaded 95% confidence interval for intermediate exposures; (b) observed versus estimated monthly (purple circles) and annual mean (black crosses) ΔT m ; and (c) observed (grey) and estimated (purple) ΔT m with shaded 95% confidence interval for the Adelaide Observatory series (in contrast to Figures 3-6, the Adelaide data are not shifted by 6 months here).Observed (grey) and estimated (purple) annual mean biases are given by the dashed lines.Each panel has the same y-axis, so it is only labelled in panel (a).
exposures were in use prior to the introduction, and (c) whether a series had previously been adjusted for the exposure bias.This database was used to identify the stations and time periods affected by the exposure bias as well as the appropriate model(s) to apply.Note that some of the historic exposures identified in the metadataincluding previously common (very early) exposures such as hanging the thermometer in a poleward-facing wellventilated room (Jurin, 1723)-do not fall into any of the categories defined here.In such instances, the exposure was categorized as miscellaneous in the database and no bias-estimation model was applied.
Figure S1 and Video S1 illustrate the metadata gathered; further details of the metadata collation process and results are in Appendix S1.

| Model application
The models were applied to individual CRUTEM5_ext stations using model predictors obtained from: WFDE5 (SWD) and 'climlab' (TOA), as described in Section 4, and the station absolute temperature record (T a ).Where missing data prevented the calculation of T a for a given year, the missing months were infilled using a climatology of the neighbouring ±15 years (n ≥ 10) and an estimate of T a used instead.
The models were only applied to stations within 30 to 60 latitude, for the reasons outlined in Section 4, and to stations which were not known to have been adjusted for the exposure bias previously.Although Section 4 also acknowledges that the method used to calculate daily-mean temperatures influences the bias, the model application did not discriminate based on this due to insufficient metadata.This is a noted limitation of this approach.
As the models were developed using the relatively few parallel measurement series available, a preliminary application of the models was conducted to identify whether applying the models outside their calibrated ranges yields implausible results (i.e., overextrapolation).A comparison of the predictors and estimated biases for CRUTEM5_ext stations with the parallel measurement observations and predictors used to develop the models found no evidence of overextrapolation for wall-mounted and closed exposures, but some evidence for open exposures.For open exposures, the range of CRUTEM5_ext predictors extended beyond the range of predictors used to develop the bias-estimation model, and, at the extremes, the estimated biases did not remain within the observed range.As we cannot be certain the relationship between the magnitude of the bias and the predictors continues linearly outside of the observed range, Equation 4 was constrained where T a < 4.84 C and Equation 5 was constrained where T a < 6.29 C, to prevent winter and summer bias-estimates exceeding 0.6 C, and 0 C, respectively, which are not generally observed.Note that these constraints affected only 36 months of data for one station in CRUTEM5_ext and the constrained bias estimates always fell within the uncertainty range of the unconstrained estimates.
The revised models were then reapplied to produce metadata-based estimates of the monthly mean exposure bias (with 95% confidence intervals; see Appendix S2), and the resulting bias estimates were combined with the CRUTEM5_ext data to produce an exposure bias adjusted version, referred to here as: CRUTEM5_eba.

| An exposure bias adjustment for the CRUTEM5 station database
The metadata identified 2,519 mid-latitude stations in CRUTEM5_ext with probable biases resulting from the transition to Stevenson screens.Of those, bias estimates F I G U R E 1 0 (a) Relationship between monthly ΔT m and shortwave downward received solar radiation with shaded 95% confidence interval for closed exposures; (b) observed versus estimated monthly (navy circles) and annual mean (black crosses) ΔT m ; and (c) observed (grey) and estimated (navy) ΔT m with shaded 95% confidence interval for the series from Muller (1984).Observed (grey) and estimated (navy) annual mean biases are given by the dashed lines.Each panel has the same y-axis, so it is only labelled in panel (a).
were produced for 1,960 stations (524,894 months) leading to the partial adjustment of 82 stations and the complete adjustment of 1,878 (Table 4).Unfortunately, not all stations or months could be adjusted-estimates could not be obtained where incomplete metadata prevented the identification of the appropriate bias-estimation model, or the presence of an exposure bias; where predictors could not be obtained, or where intermediate or miscellaneous exposures were identified (Table 4; Figure 11).Despite this, the metadata gathered, and the adjustments applied, indicate that the bias has now been accounted for at 75.1% of mid-latitude stations, representing 86.3% of the mid-latitudinal data in terms of monthly values.This is compared with just 37.7% of mid-latitude stations in CRUTEM5_ext (of which only 1.5% were known to have been adjusted).
At the hemispheric scale the impact of the bias adjustments is relatively small (Figure 12).In the Northern Hemisphere (NH) the mid-latitude annual mean is ≤0.016C warmer before 1870 in CRUTEM5_eba and up to 0.1 C cooler between 1870 and 1934.In the former period, the seasonal adjustments are all of a similar magnitude, but differ in sign between spring/autumn (≈0.05 C) and summer/winter (≈−0.03); whereas in the latter period the seasonal adjustments are all negative, with the largest adjustments in summer (−0.2 C).The small adjustments-and their unique seasonal structure-before 1870, are due to the predominance of wall-mounted exposures, which introduce biases with an annual mean close to 0 C and a biannual seasonal cycle (Section 3.2).The change in the direction and seasonal structure of the bias after 1870 (and the increase in magnitude) occurs as a result of more series requiring correction for freestanding exposures, which produce larger biases with a single peak in the seasonal cycle (Sections 3.1 and 3.4).The largest adjustments, however, are relatively geographically constrained (Figure 13) meaning the overall effect on the mid-latitude mean remains small.In the Southern Hemisphere (SH) the adjustments are always negative, with a similar seasonal structure to the latter period in the NH, but smaller and more temporally constrained, peaking at −0.07 C in 1856 and decreasing approximately linearly to ≈0 C by 1900.Here, the majority of adjustments are also for freestanding exposures, explaining the similar seasonal structure, however the smaller magnitude of the bias is because fewer adjustments were made in the SH: only 11 stations (2,803 months) compared with 1,949 stations (522,091 months) in the NH.This is partly because there is less land, and thus fewer stations, in the SH midlatitudes, but also because station records often started later, when Stevenson screens were already in place.Not all earlier stations were adjusted either: many lacked accessible metadata, exposed thermometers in intermediate exposures which could not be adjusted, or had been adjusted previously.Many also introduced Stevenson screens comparatively early, meaning fewer adjustments were required.
The magnitude of the bias adjustments on a regional basis is much larger than for the mid-latitude means and exhibits significant spatiotemporal variability (Figure 13).Between 1851 and 1900, for example, large negative adjustments (up to −0.79 C in summer, and −0.57C annually) are present in Mediterranean Africa and central Asia, whereas small positive adjustments are present in North America.This variability arises from the spatio-temporal heterogeneity of the historic exposures in use (Video S1), as well as the influence of solar radiation/ temperature.The large negative adjustments in Mediterranean Africa and central Asia, for example, reflect the use of freestanding exposures in those regions combined with the influence of comparatively strong solar radiation and/or hot temperatures.In contrast, the small positive adjustments in North America reflect the continued use of wall-mounted exposures in the United States and Canada until the 1890s and 1900s, respectively, when many other mid-latitude nations had introduced freestanding exposures or Stevenson screens.
A comparison between the bias adjustments produced here and the representation of the bias in HadCRUT5 (Figure 12) shows reasonable agreement annually over large spatial scales, but regionally the comparisons reinforce the limitations of the current representation identified in Section 1.The assumption of a fixed annual bias in HadCRUT5, with no spatio-temporal variation (outside of the two specified latitudinal bands), fails to capture the pronounced seasonal nature of the bias, differences in the magnitude and seasonal structure between (and within) exposure classes, or the spatiotemporal differences in the use of historic exposures and Opportunities for comparison with previous exposure bias adjustments are limited.Many previous studies incorporated corrections for other inhomogeneities (e.g., Ashcroft et al., 2012), did not adjust monthly T m (e.g., Ashcroft et al., 2022;Auchmann & Brönnimann, 2012) and/or are based on the parallel measurements used here (e.g., Brunet et al., 2006).Comparisons with the few independent assessments available produce mixed results.Our estimates of a −0.44 C summer bias in Uppsala (which was not applied because this series in CRUTEM5 had already been adjusted), between 1858 and 1864, and ≈−0.Although the cited assessments are themselves uncertain, the mixed results do highlight some potential limitations of this approach.The biases estimated here represent the average bias for a specified exposure and location, based on the identified relationship(s) between the bias and up to two variables.As such, the bias-estimation models cannot take into account stationspecific factors or differences within categories of exposure (although these are partially reflected in the confidence intervals).The accuracy of the estimates is also dependent on the accuracy of the exposure metadata collated, which in many cases, is nation-, rather than station-, specific (Appendix S1).This approach, however, is designed to give an estimate of the exposure bias in global temperature compilations in the absence of station-level homogenisation; it is not designed to replace detailed, station-specific, homogenisation, which is always preferable.Overall, therefore, confidence can be taken from the assessed performance of the biasestimation models (Section 4) and the favourable comparison of our regional results with those of Parker (1994) who anticipated annual mean biases: close to 0 C in Canada, central and Eastern Europe, Russia, the UK and the United States until the late-19th century, between 0 C and −0.2 C in France and Australia in the same period, and between −0.1 C and −0.2 C in Russia between 1870 and 1910, in line with the estimates here.

| SUMMARY AND CONCLUSIONS
This study has advanced our understanding of the characteristics of the exposure bias introduced by the transition to Stevenson-type screens and has presented an empirical approach to address the exposure bias in an extended version of CRUTEM5, using regression-based bias-estimation models and exposure metadata.To the best of our knowledge, this is the first time such an approach has been used to produce a near-global assessment of the exposure bias (although similar approaches have been used locally, e.g., Brunet et al., 2006Brunet et al., , 2011)).
The compilation and analysis of 54 parallel measurement series comparing the temperatures recorded in Stevenson-type screens with those recorded in four categories of historic exposure identified seasonally and diurnally varying biases which differed according to the class of exposure.The largest biases in mean temperatures were generally found in freestanding exposures (up to −0.78 C annually) and in the summer months, while the smallest biases were generally found in the winter months and in wall-mounted exposures (near 0 C annually, though the biases arising from wall-mounted exposures showed large inter-site variability).These results are generally consistent with those of Parker (1994), but the inclusion of additional parallel measurement series here highlighted possible regional differences in the magnitude of the exposure bias and confirmed a bi-annual cycle is a common (and distinct) characteristic of the exposure bias arising from wall-mounted exposures.The results also reinforced the need for a monthly-resolved and exposure-specific assessment of the exposure bias in global datasets.
The identification of significant relationships between the magnitude of the bias in monthly mean temperatures and received shortwave downward solar radiation, downward top of atmosphere solar radiation and/or absolute temperature in the historic exposure for each category of exposure facilitated the development of three exposurespecific, bias-estimation models.The models were found to skilfully reproduce the direction and seasonal cycle of the exposure bias, but with reasonably large confidence intervals (particularly for wall-mounted exposures) which reflect the influence of site-specific factors on the magnitude of the bias.The development of these models builds on the results of previous studies which identified relationships between the magnitude of the exposure bias and absolute temperature and/or solar radiation (e.g., Ashcroft et al., 2022;Auchmann and Brönnimann, 2012;Brunet et al., 2011;Margary, 1924), but is, to our knowledge, the first-time these relationships have been quantified for multiple classes of exposure using a compilation of parallel measurement series from across the mid-latitudes.
The bias adjustments produced using the biasestimation models and exposure metadata showed reasonable agreement with the representation of the bias in the HadCRUT5 error model annually and over large spatial scales.However, the estimates produced here refine the representation of the bias seasonally and at regional scales by taking into account the annual cycle of the bias and the different exposure history of individual stations and regions.This refinement is reflected in the spatiotemporal variability evident in our bias adjustments compared to the static nature of the bias represented in the HadCRUT5 error model.This improved representation is important to allow a more complete understanding of the impact of exposure biases on global temperatures, particularly the affect they have on regional and seasonal climate trends.
The presented approach is not without limitations.The development of the bias-estimation models is limited by the relatively small number of parallel measurements available to analyse and the accuracy of the individual bias estimates is contingent on the accuracy of the exposure metadata used to inform the model application.Nevertheless, the approach represents a first step toward the more accurate assessment of the (monthly-resolved) impact of exposure biases in mid-latitude weather stations and it is hoped the approach can be continually updated and improved in future, possibly via the inclusion of additional exposure metadata and parallel measurement series.The latter, in particular, would help to refine the relationships between the bias and the relevant predictor(s) and constrain (or more accurately capture) the uncertainty associated with the bias estimates.
A number of avenues for further work exist.The approach outlined above specifically addresses the exposure biases arising from three classes of historic exposure in the mid-latitudes.It does not address the exposure biases present in the tropics and high latitudes or the biases arising from other categories of exposure (i.e., intermediate exposures; very early indoor exposures) or changes to the design of the Stevenson screens in use over time (see Mawley, 1884;Naylor, 2019).At present the availability of parallel measurements prevents the extension of the presented approach to address these areas, however it is hoped additional parallel measurements may become available in future.Further work is therefore required to address these areas and to more fully account for the exposure biases present in global temperature compilations.Future work could also combine the exposure bias adjustments with the (postadjustment) application of pair-wise station homogenisation algorithms to assess the broader homogeneity of the CRUTEM5_ext station data.and other places in South Australia and the Northern Territory, during the year 1888.( 1890

K
E Y W O R D S exposure bias, land air temperature, observations, parallel measurements, Stevenson screen 1 | INTRODUCTION Land surface air temperature (LSAT) observations are vital to advancing knowledge of climate variability and change.They form a key component of global surface air temperature datasets used for climate assessment (e.g., and Spain F I G U R E 1 Examples of common thermometer exposures.(a) Stevenson screen, (b) Glaisher stand (Open), (c) a type of wall-mounted screen, (d) Summerhouse (Intermediate) and (e) Wild Hut (Closed).Image sources: (a) Gaster (1882); (b), (d) Royal Society of New South Wales; (c) Mawley (

F
I G U R E 2 Locations of the parallel measurement series collated and analysed.−0.27 C in Gaster (1882) to −1.68 C in Martinez Ibarra et al. (2010).ΔT x shows a clear seasonal cycle, increasing from an average warm bias of −0.08 C in winter to −1.04 C in summer.In contrast, the minimum temperatures recorded in open exposures are generally cooler than in Stevenson screens-on average by 0.36 C annually (range: 0.11 C-0.92 C)-with no obvious seasonal cycle.These differences can be explained by the larger quantity of reflected shortwave solar radiation and longwave terrestrial radiation which influence thermometers in open exposures during the day, causing a warm bias in T x , and the greater radiative heat loss from the open exposures at night, causing a cool bias in T n .The larger deviations in ΔT x during the summer months are due to the increased strength of solar radiation.The opposite signs of ΔT x and ΔT n mean the largest differences occur in the diurnal temperature range, with an exaggerated DTR in open exposures compared to Stevenson screens.On average the DTR is 1.15 C larger annually in open exposures, but with mean annual (monthly) differences as large as −1.92 C (−2.61 C) in individual series.Despite the opposite signs of ΔT x and ΔT n , the bias in the mean does not cancel out, instead the larger magnitude of ΔT x leads to warmer mean temperatures in open exposures compared to Stevenson screens, on average by 0.21 C annually, but with substantial variation in annual mean ΔT m between the individual series (range: −0.78 C to 0.11 C).The strong seasonal cycle in ΔT x is apparent in both ΔT m and ΔDTR with average monthly differences in ΔT m (ΔDTR) varying between 0.07 C (−0.4 C) in winter and −0.35 C (−1.4 C) in summer.

F
I G U R E 3 Differences between monthly mean temperatures recorded in Stevenson screens and open exposures.The violin plots (black with grey shading) sometimes extend beyond the range of the individual series (coloured lines) because the violin plots show the mean and distribution of all individual monthly ΔT values whereas the coloured lines show monthly ΔT averaged over all years available for each individual series (lines are dashed where series have ≤12 months of data).Series located in the Southern Hemisphere have been shifted by 6 months to allow comparison with the Northern Hemisphere series.

F
I G U R E 4 As Figure 3, but for differences between Stevenson screens and wallmounted exposures.the temperatures recorded in each.Maximum temperatures in intermediate exposures are generally warmer than in Stevenson screens-on average by 0.21 C annually, with little variation between series (range: −0.11 C to −0.38 C).The warmer maxima are likely the result of more reflected radiation from the surrounding unshaded ground reaching the thermometer in intermediate exposures, as they provide less lateral and basal protection than Stevenson screens.The stagnation of warm air in the eaves of some intermediate exposures, due to the roof structures impeding airflow, may also contribute to the warm bias.A seasonal cycle in ΔT x is present in two series-Gaster (

F
I G U R E 7 (a) Relationship between annual maximum ΔT m (solid line and filled markers), annual minimum ΔT m (dotted line and unfilled markers) and annual mean T Hist (T a ) with shaded 95% confidence intervals for open exposures; (b) observed versus estimated monthly (orange circles) and annual mean (black crosses) ΔT m ; and (c) observed (grey) and estimated (orange) ΔT m with shaded 95% confidence interval for the AEMET La Coruna series(Brunet, pers.comms).Observed (grey) and estimated (orange) annual mean biases are given by the dotted lines.Each panel has the same y-axis, so it is only labelled in panel (a).T A B L E 3 Lower and upper quartiles of the key performance indicators for the best-performing statistical model in each exposure class.

F
I G U R E 1 1 Location of stations which have been adjusted for the exposure bias, have not been adjusted but contain probable biases (for reasons given in Table4), or do not have metadata.Stations with metadata, but which were not adjusted, are not shown.F I G U R E 1 2 Difference between CRUTEM5_ext and CRUTEM5_eba annual and seasonal means (ΔT = CRUTEM5_eba − CRUTEM5_ext) for the Northern and Southern Hemisphere mid-latitudes (coloured lines).The light blue line shows the number of stations with a non-zero bias estimation (adjustment).The dark grey shading represents the range of annual mean exposure bias adjustments present in each CRUTEM5_eba grid cell.The light grey shading with dashed black outline represents the approximate range of 100 realizations of the exposure bias component in the HadCRUT5 error model for comparison.timing of the introduction of the Stevenson screen.These limitations are highlighted by the large discrepancies between the HadCRUT5 realizations and the grid cell adjustments (particularly between 1868 and 1934 in the NH and the late-1800s in the SH) and by the static nature of the realizations in comparison to the time-varying adjustments produced here.Note, however, that the simple exposure bias error model used in HadCRUT5 was only designed to capture the large-scale influence of the bias rather than the local variations.
2 C summer bias in the NH mid-latitudes, between 1880 and 1900, are of a similar magnitude to assessments by Moberg et al. (2003) and Frank et al. (2007) who estimated biases of −0.5 C to −0.8 C in Uppsala and approximately −0.3 C in the NH (30-90 N), respectively.However, Frank et al. (2007) estimated increasingly large NH biases earlier in the record and Moberg et al. (2003) consistently large biases in Uppsala and Stockholm, Sweden, before 1858, when the estimates here (again, not applied) suggest smaller biases due to the use of wall-mounted exposures.

F
I G U R E 1 3 Difference between CRUTEM5_ext and CRUTEM5_eba seasonal and annual means over time (ΔT = CRUTEM5_eba − CRUTEM5_ext).
) reveal clear differences between the temperatures recorded in open exposures and Stevenson screens, across all four variables.The predominantly negative ΔT x indicates maximum temperatures tend to be cooler in Stevenson screens than in open exposures, with mean annual ΔT x ranging from Hist values, were not included as input in the final model.

Table 2
) show the magnitude of ΔT m in open exposures is significantly related to all three of the explanatory variables assessed.Each variable has a negative relationship with the magnitude of the bias, with increasing temperature and solar radiation both resulting in a larger warm bias in open exposures relative to Stevenson screens.This relationship is consistent with our understanding of the mechanisms which cause the exposure bias in open exposures (outlined in Section 3.1), as well as with previous studies which found larger biases in open exposures to correspond with stronger solar radiation Number of stations and months which have been adjusted for the exposure bias, as well as the numbers which still require adjustment.
a Stations may be counted more than once; for example, if a station has a period with missing metadata and a period with an intermediate exposure (no model).
) Adelaide.Available at: https://hdl.handle.net/2027/uc1.c2999509Meteorological Observations made at the Adelaide Observatory, and other places in South Australia and the Northern Territory during the year 1889.(1891) Adelaide.Available at: https://hdl.handle.net/2027/uc1.c2999510Meteorological Observations made at the Adelaide Observatory, and other places in South Australia and the Northern Territory, during the year 1890.(1892) Adelaide.Available at: https://hdl.handle.net/2027/uc1.c2999511Meteorological Observations made at the Adelaide Observatory, and other places in South Australia and the Northern Territory, during the year 1891.(1894) Adelaide.Available at: https://hdl.handle.net/2027/uc1.c2999512Meteorological Observations made at the Adelaide Observatory, and other places in South Australia and the Northern Territory, during the year 1892.(1894) Adelaide.Available at: https://hdl.handle.net/2027/uc1.c2999513Meteorological Observations made at the Adelaide Observatory, and other places in South Australia and the Northern Territory, during the year 1893.(1896) Adelaide.Available at: https://hdl.handle.net/2027/uc1.c2999514Meteorological Observations made at the Adelaide Observatory, and other places in South Australia and the Northern Territory, during the year 1894.(1897) Adelaide.Available at: https://hdl.handle.net/2027/uc1.c2999515Meteorological Observations made at the Adelaide Observatory, and other places in South Australia and the Northern Territory, during the year 1895.(1898) Adelaide.Available at: https://hdl.handle.net/2027/uc1.c2999516Meteorological Observations made at the Adelaide Observatory, and other places in South Australia and the Northern Territory, during the year 1896.(1899) Adelaide.Available at: https://hdl.handle.net/2027/uc1.c2999376Meteorological Observations made at the Adelaide Observatory, and other places in South Australia and the Northern Territory, during the year 1897.(1900) Adelaide.Available at: https://hdl.handle.net/2027/uc1.c2999377Meteorological Observations made at the Adelaide Observatory, and other places in South Australia and the Northern Territory, during the year 1898.(1901) Adelaide.Available at: https://hdl.handle.net/2027/uc1.c2999378Meteorological Observations made at the Adelaide Observatory, and other places in South Australia and the Northern Territory, during the year 1899.(1902) Adelaide.Available at: https://hdl.handle.net/2027/uc1.c2999379Meteorological Observations made at the Adelaide Observatory, and other places in South Australia and the Northern Territory during the years 1886-7.(1893) Adelaide.Available at: https:// hdl.handle.net/2027/uc1.c2999508Moberg, A., Alexandersson, H., Bergström, H. & Jones, P.D. (2003) Were southern Swedish summer temperatures before 1860 as warm as measured?International Journal of Climatology, 23