How useful are crowdsourced air temperature observations? An assessment of Netatmo stations and quality control schemes over the United Kingdom

Observations of the real‐time state of the atmosphere are required in order to initialize numerical weather prediction (NWP) models. As NWP resolution improves, more observations are needed, to better capture regional variations in atmospheric conditions. In particular, surface observations are necessary to reflect conditions experienced on the surface. One proposed opportunity to increase the number of surface observations available for assimilation into NWP is to crowdsource the data from home weather stations. This study investigates the outdoor air temperature measurements made by Netatmo home weather stations, through validation against a calibrated laboratory chamber and by evaluating quality control schemes that are applied to a UK‐wide network of Netatmo stations. In a series of controlled lab experiments, it was found that the Netatmo temperature sensor was accurate to 0.3°C. The response to fluctuations in temperature is lagged, with τ (the time taken for 63% of the change to be measured) calculated as 12.7 min for a near‐instantaneous decrease in temperature. Netatmo temperature observations were compared with Met Office MIDAS hourly weather observations. A warm bias in excess of 1°C was present in the Netatmo temperature observations, which was lessened by the three quality control schemes tested, but still in excess of 0.5°C. Hence, Netatmo temperature measurements have potential to be assimilated in NWP in the United Kingdom, but work is required to find a suitable agreed quality control scheme to filter out anomalous observations in the United Kingdom.


| INTRODUCTION
Surface observations are an integral part of initializing weather models and verification of forecasts. In the United Kingdom, the Met Office operates and maintains around 309 surface stations for record-keeping and for numerical weather prediction (NWP) (Met Office, 2012). While the observations made by Met Office surface stations are subject to calibration checks and are quality controlled (Chapman et al., 2017), the mean distance between Met Office observation stations is 17.6 km. This is a significantly longer distance than the grid spacing employed by the Met Office's UKV forecasting model of 1.5 km (Met Office, 2019). Good initial conditions are crucial to ensuring an accurate forecast is produced, so spatially dense surface observations become more necessary to initialize NWP models as resolution of forecast models increases in the future (Inness & Dorling, 2013). The increase in skill of NWP, along with improved resolution due to increased computing power and additional observation sources (such as satellites), now mean accurate forecasts can be provided on a very local scale (Bauer et al., 2015;Benjamin et al., 2019;ECMWF, 2020).
Crowdsourcing has grown in popularity over the last decade as a method to utilize small efforts from many people or devices and combine them into a useful output (Chapman et al., 2017;Elmore et al., 2014;Hawkins et al., 2019;Lackstrom et al., 2017). Thousands of crowdsourced observations are made every day from a wide range of automatic weather stations globally, and these observations are not yet widely utilized by meteorological agencies to forecast the weather. The Met Office's Weather Observations Website (WOW) (https://wow. metoffice.gov.uk/) and Weather Underground (https:// www.wunderground.com/) are two examples of websites allowing users and organizations to submit data from their own weather stations to a larger database. However, observations on WOW and Weather Underground are made by a large variety of weather stations from a range of manufacturers, which have different biases and error characteristics (S. Bell et al., 2015).
The Netatmo Smart Home weather station network (https://www.netatmo.com/en-gb/weather/) overcomes this issue as all data are created from the same type of sensors. This reduces the biases between different sensor types, which can be accounted for when using the observations. There are over 5000 Netatmo stations making observations in the United Kingdom, more than 15 times the number of Met Office surface stations in the Meteorological Monitoring System (MMS) (Green, 2010). However, the data quality of Netatmo stations is expected to be less than that of the Met Office stations, which this study will attempt to quantify.
While the Met Office stations are approximately equidistant, the Netatmo stations are correlated to the population density of the United Kingdom. Figure 1 shows this distribution: there are far more Netatmo stations in London and the southeast of England than the Highlands and Islands of Scotland. Meanwhile, the Met Office surface observation stations are fewer in number than Netatmo stations, but offer more even coverage across the country.
Observations from Met Office stations are trusted because they are regularly calibrated and are wellmaintained, with quality control procedures in place in accordance with World Meteorological Organisation (WMO) guidelines (Inness & Dorling, 2013;WMO, 2018). For example, the platinum resistance thermometers at Met Office surface stations are calibrated every 8 years (Met Office, 2020b). Quality controlling data is the 'greatest challenge' faced by network managers of automatic weather stations (Fiebrich, 2009). In contrast, there is no centralized way for owners of Netatmo stations to supply metadata about how often their stations are calibrated and maintained, if at all. Hence, there is no way to tell whether an anomaly in Netatmo observations is due to some microscale effect or due to station placement or instrument error (Chapman et al., 2017). Some owners may maintain their stations, others may not, such as checking that the Netatmo temperature sensor is not in direct sunlight, which contributes to a warm bias in the Netatmo data (Nipen et al., 2020). Identifying and quantifying the uncertainties within measurements made by Netatmo stations would make assimilation of Netatmo data into an NWP model far more valuable. Table 1 summarizes the differences between observations from Met Office surface stations and Netatmo home weather stations.
Supplementing observations from traditional networks with crowdsourced data has proved useful for studies where traditional observations are sparsely located, such as the following examples. Evidence of a nighttime urban heat island (UHI) effect of up to 5.5 C was shown in London using 287 Netatmo and 7 Met Office stations, utilizing the Met Office data to remove anomalous Netatmo observations (Chapman et al., 2017). Netatmo stations were also used by Mandement and Caumont (2019) to analyse deep convection in France, showing that rapid changes in atmospheric pressure were detected around squall lines. Data from 63 citizen rain gauges around Amsterdam were correlated with rain radar observations (De Vos et al., 2017). However, limitations specific to Netatmo stations are outlined, including the lack of a radiation shield on the Netatmo temperature module, contributing to a lagged response time to temperature changes (Büchau, 2018;Chapman et al., 2017).
Netatmo data are used operationally by MET Norway to correct near-future temperature forecasts due to features such as cold pools and UHIs (Nipen et al., 2020). These examples show that crowdsourced data can be used for studies of phenomena missed by conventional observations, and aid forecasting.
This study characterizes the uncertainties in temperature observations made by Netatmo weather stations. This is completed in two stages. A two-staged approach separates the influence of the sensors themselves, and the placement of sensors. Firstly, a set of calibration experiments in a climate chamber quantifies the sensor uncertainty characteristics without external influencing factors. The second part of this study quantifies the bias in temperature observations in the UK Netatmo network, and involves assessing potential quality control schemes for crowdsourced Netatmo temperature observations in the United Kingdom. Quality control schemes smooth the data, which remove some real events. However, it is advantageous to NWP assimilation to receive a smoothed field (Dunne & Entekhabi, 2005). Several different schemes have already been proposed in the literature (Clark et al., 2018;Meier et al., 2017;Nipen et al., 2020), three of which are tested here to determine which scheme best reduces the warm biases for the UK network. There is scope for meso-γ and denser observation networks to add value to forecasting and environmental monitoring (Chapman et al., 2015). For Netatmo sensors in particular, there is evidence to show that there are biases and a lagged response within the observations made by the temperature sensor (Büchau, 2018;Chapman et al., 2017;Meier et al., 2015).

| DATA
This study makes use of three data sources: Netatmo citizen weather station observations; laboratory calibration , , , F I G U R E 1 Map showing the location and density of Netatmo stations (yellow/green to purple) and Met Office surface stations (black) in the United Kingdom. Each box represents approximately 20 km 2 of area. There are far more Netatmo stations in highly populated areas than there are in rural areas, as demonstrated by the population density map (inset). Population data supplied by the Natural Environment Research Council (Reis et al., 2017). and reference equipment at the National Centre for Atmospheric Science (NCAS) Atmospheric Measurement and Observation Facility (AMOF); and Met Office MMS data. The following subsections describe these data sources.

| Netatmo data
The data from Netatmo weather stations are collected using the Netatmo Weather Application Programming Interface (API). A series of Python scripts were written to collect data operationally on high-performance computing (Lawrence et al., 2013). There are limits imposed on the API by Netatmo (of no more than 50 requests every 10 s, and no more than 500 requests per hour [Netatmo, 2012]), so the scripts are designed to collect as much data as possible, with a lag of 1 day to allow slow to upload stations catch up (Table 2).

| Pressure, temperature, humidity calibration facility calibration
The calibration in the pressure, temperature, humidity calibration (PTUCal) chamber at AMOF in NCAS utilizes two main pieces of equipment: a Votsch VT-4011 temperature test chamber containing a high accuracy thermometer and a Michell S8000 Remote Precision Dewpointmeter. The test chamber holds each Netatmo sensor being tested, the temperature of which is regulated by the dewpointmeter. Table 3 summarizes the differences between the laboratory instruments and the Netatmo sensors being tested.
T A B L E 1 Summary of the characteristics of both the Netatmo stations and Met Office surface stations in the United Kingdom as of April 2020 (Büchau, 2018;Green, 2010;Met Office, 2018b;WMO, 2018) (Clark et al., 2014).

| METHODS
This study characterizes the uncertainties and behaviours of the Netatmo station network in the United Kingdom for temperature measurements, using two approaches. Firstly, the bias contribution from the sensors themselves is identified by using ideal conditions within the PTUCal chamber, removing external factors like solar or anthropogenic heating. In addition, the bias from the placement of a sensor can be quantified once the sensor bias is known from the chamber tests. Three quality control schemes to remove or correct anomalous observations from Netatmo temperature sensors in the United Kingdom are compared.

| PTUCal climate chamber experiment
Seven Netatmo outdoor temperature sensors (one owned by the University of Leeds [UoL], six loaned from the University of Birmingham [UoB1-6], see Acknowledgements) are tested using the PTUCal chamber described in Section 2.2 at AMOF at NCAS in Leeds to determine biases within the sensors. The specification of the instruments is outlined in Table 3.
Each Netatmo outdoor temperature sensor (UoL, UoB1-6) is tested individually using the same experimental set-up, shown in Figure 2, using the equipment outlined in Section 2.2. In the study by Büchau (2018), some experiments were conducted by removing the outer case of the temperature sensor. Since there is a very low probability that any Netatmo owners in the network remove the aesthetic casing, it is retained for the experiment to realistically capture the characteristics of a networked sensor. The climate chamber is a highly controlled environment, which allows assessment of the Netatmo temperature measurements against a well-calibrated dewpointmeter, but is not a proxy for the real world. The thermocouple was placed directly next to the Netatmo outdoor module so that the same parcel of air is measured by both sensors. The chamber contains a fan for ventilation.
The chamber cycles through the same programme of temperatures for each different Netatmo sensor: the temperature in the chamber is set to 40 C, and then after 3 h the temperature is decreased by 2.5 C, until the test at À10 C is completed. In each 3-h period, the first 2 h are a delay time to allow the chamber and the Netatmo sensor to reach a stable equilibrium, so no lag time effect exists. The range in temperatures tested reflects the range of air temperatures commonly observed in the United Kingdom (Parker et al., 1992). The 2.5 C step change is reasonable, as it is a realistic change in temperature observed in the United Kingdom. For example, Clark et al. (2018) give an example of a 3.2 C change in temperature, in 8 min.

| Comparison of Netatmo and Met Office observations
Having investigated the existence of any biases in the Netatmo temperature measurements, using the PTUCal F I G U R E 2 A Netatmo temperature sensor in the climate chamber. The thermocouple is attached by a cable tie so that the same parcel of air is measured by both sensors during the test. climate chamber, comparisons are made to Met Office observations from Netatmo stations located throughout the United Kingdom to test its performance in the real world. Met Office observations are accurate to ±0.1 C, while Netatmo temperature measurements are stated to be accurate to ±0.3 C (Clark et al., 2014;Netatmo, 2020). The Netatmo observations are gridded using the scipy. interpolate.griddata function in the SciPy library in Python (Virtanen et al., 2020), which uses Delaunay triangulation on the data before barycentric linear interpolation. Clark et al. (2018) use Delaunay triangulation for gridding Netatmo data. Linear barycentric interpolation avoids points of discontinuity, which is the case when using bilinear interpolation (the approach used by Hollis et al. (2019)). The interpolated Netatmo temperatures co-located over the Met Office station locations are then compared with the observations made directly by the Met Office surface stations. Interpolating Netatmo observations in this way may cause some error, but the error is assumed to be unbiased when the mean of the whole United Kingdom is taken.

| Quality control schemes
Three quality control schemes, from Meier et al. (2017), Nipen et al. (2020), and Clark et al. (2018), are implemented on the UK Netatmo data from April 2020, to quantify the bias in the Netatmo temperature sensor from its placement within the environment.

| Meier scheme
The Meier scheme has four stages: 1. Remove Netatmo stations with identical latitude and longitude (as this suggests the location has defaulted to one based upon their IP address). A day (month) of observations are removed if a station records for less than 80% of the time within that day (month).

Daily averages of minimum air temperature from all
Netatmo and reference stations (in our case, Met Office sites) are compared. If a Netatmo station has a monthly mean minimum temperature that is outside 5σ of the average Netatmo temperature, or SD outside 5σ of the reference data, then the Netatmo station is removed. 3. For each hour, the Pearson linear correlation between incoming solar radiation and temperature difference between a Netatmo station and the mean reference observed temperature are calculated. If p < 0:01 and Pearson r > 0:5 (Wilks, 2011) between the Global Solar Irradiation Amount and t Netatmo À t Reference , then it is concluded that solar heating is directly affecting the sensor and it is therefore removed. 4. For each station, the mean and SD σ obs of all available observations over the entire month are calculated. If any single observation deviates from the mean by more than 3σ obs (for example, if a sensor was temporarily moved to have its batteries changed), then this observation is removed.

| Nipen scheme
The Nipen scheme utilizes the spatial distribution of stations and has three sections: 1. A 'buddy check' to remove observations within 3 km and 30 m of altitude for each observation. If an observation has a deviation from the mean neighbourhood observation of greater than 2σ, then this observation is removed.

Stations without five neighbouring stations within 15
km and 200 m of altitude are removed to avoid an isolated site having an erroneous value that cannot be cross-referenced. 3. The majority of the computational expense is required for the spatial consistency test (outlined in Lussana et al., 2010): The United Kingdom is grouped into approximately 100 km by 100 km squares. For each region at each time, the mean temperature and the variance of the error compared with the mean are calculated. Any observation with a warm deviation that is more than four times the error variance will be removed, and any observation with a cold deviation that is more than eight times the error variance will be removed also. A less strict threshold is used for cold deviations because Nipen et al. (2020) explain that 'most error sources', for example, direct sunlight or proximity to walls, 'contribute to a warm bias'.

| Clark scheme
The final scheme used here is described in Clark et al. (2018), which adjusts or 'nudges' observations with anomalous measurements, based on the nearest Met Office site, which is in contrast to the other schemes outlined above that remove anomalous observations. The amount to adjust each Netatmo observation is calculated as the difference between the 6-h mean temperature from the nearest Met Office site (T À 3 to T + 3), to the same 6-h mean temperature from the Netatmo station, using the observations made at the same times used for the Met Office observations. This correction varies from day to day. For example, for 12:00 UTC, the observations from 9:00, 10:00, 11:00, 12:00, 13:00, 14:00 and 15:00 UTC are used to calculate these averages. Originally, the Clark scheme was used to analyse the events during an evening, when biases due to solar heating are smaller. Hence, the Clark scheme may perform poorly during daylight hours when biases in Netatmo readings due to solar heating have more of an effect (Clark et al., 2018, p. 640

| RESULTS
In this section, the results from the temperature tests from the PTUCal climate chamber and the UK comparisons of Netatmo and Met Office observations are presented. In addition, the lag in Netatmo observations is studied by comparing a co-located Netatmo and Met Office station.

| PTUCal climate chamber experiment
As discussed in Section 3.1, seven Netatmo stations (UoL, UoB1-6) are tested using the PTUCal climate chamber in AMOF at NCAS in Leeds. The validation of the manufacturer-claimed performance in a climate chamber allows the quantification of biases within the sensors themselves, so uncertainties in temperature measurements can be separated into those as a result of the Netatmo manufacture/calibration process and those as a result of a user placing the sensors inadequately. Figure 3 shows a short section of the temperature experiment on the UoB1 outdoor sensor. The response by the Netatmo temperature sensor to changes in temperature is significantly lagged (quantified later in Figure 5).
However, after the Netatmo readings have adjusted to the new temperature, the Netatmo measurements stay within 0.3 C of the chamber thermocouple observations (and in most cases better than 0.3 C). The same behaviour is witnessed across all temperature ranges (À10 to 40 C) and all seven sensors used in the chamber experiments. Netatmo states on their website that the F I G U R E 3 Section of a temperature chamber test, showing the lagged response time in the Netatmo temperature sensor compared with the response by the thermocouple in the chamber. The dashed horizontal lines are plotted at y = 30.92 and y = 28.42, to represent a temperature decrease of 1.58 C from the initial chamber temperature (τ).  (Netatmo, 2020). Figure 4 shows that observations from Netatmo stations in the final hour of each test are well within 0.3 C of the chamber observations, and in most cases, the Netatmo measurements are within 0.2 C of the measurement from the chamber observations, across all temperatures À10 to 40 C and all seven sensors tested. Any fluctuations in the temperature of the climate chamber will be captured by the thermocouple, but the lagging of the Netatmo response means that some noise in the data may be due to this. The study by Meier et al. (2017) included an investigation of the accuracy of Netatmo temperature observations at seven distinct air temperatures in the range 0-30 C in a climate chamber. Our study investigated the accuracy of Netatmo observations at a greater range of temperatures, and used a reference sensor with a stated temperature accuracy of 0.1 C (see Table 3), in contrast to 0.28 C in the Meier study. The Meier study found all sensors were within the Netatmo accuracy of 0.3 C, except at 0 C, and some sensors showed a slight warm bias of around 0.5 C. In our study, all observations were within 0.3 C of the reference observations, but no station showed a warm bias of a magnitude close to 0.5 C. The Netatmo temperature sensors only make an observation every 5 min compared with the thermocouple in the chamber, which makes an observation every 2 s (Figure 3 includes thermocouple observations at 5 min intervals to match the gap between Netatmo observations). Figure 5 shows the time between the Netatmo sensor and chamber thermocouple recording a temperature decrease of 1.58 C (63% of 2.5 C: τ). The mean lag time was 12.7 min. At least two Netatmo observations had been made in the intervening time. There is no visible dependence between the temperature of the chamber and the length of time for the Netatmo measurement to have decreased by 1.58 C, which suggests that the lagged response is a systematic bias within the Netatmo sensor. Büchau (2018) also included an investigation of the lagged response to temperature changes, but over a greater step (in excess of 15 C) than studied here. Our aim here was to have a step change of temperature that was realistic in the United Kingdom. Büchau (2018) used a non-linear least square method to calculate a mean time constant of 22.46 min.

| National observations and quality control schemes
After quantifying the uncertainty characteristics of the sensors in a controlled environment, 1 month of air temperature data from the Netatmo stations within the United Kingdom are analysed, from April 2020. The data are collected using the Netatmo API, and the number of captured stations depends upon the status of the local batch cluster where jobs are run, and the Netatmo API (see Section 2.1). The number of stations varies between days, with a mean of 4667 over the month. The smallest number of stations captured was 1384, and the largest 5381. The Netatmo data are used to compare the performance of the Netatmo stations against Met Office stations, and to characterize any uncertainties exhibited in the Netatmo data.
When originally implemented around Berlin, the Meier scheme (Section 3.3.1) removed around 54% of observations. In this study, only 30% of observations are removed using this scheme. In Norway, the Nipen QC scheme (Section 3.3.2) removed approximately 21% of observations, while in this study, around 11% of UK observations are removed. Unlike the other two schemes, which removed anomalous observations, the Clark QC scheme (Section 3.3.3) adjusted temperatures if they were inaccurate. Exactly 26% of observations are adjusted by more than 2 C by the Clark scheme in this study. Figure 6 shows the comparison between the effects of each quality control scheme on the UK Netatmo temperature data at midday on 1 April 2020. During this day, the south of England was experiencing a ridge of high pressure, while a cold front moved slowly southwards through Scotland, before decaying (Wetter, 2020). In the unfiltered data plot, there are hot spots (with deviations from the mean in excess of 10 C in cases). The hot spots indicate that there are many Netatmo temperature F I G U R E 5 Time taken in minutes for the Netatmo temperature to record a decrease in temperature of 1.58 C, after the chamber had already recorded such a decrease. The mean lag was 12.7 min. sensors recording unreasonably warm temperatures for outdoor sensors. This is most likely caused by these sensors either being located indoors, being affected by heat from buildings such as near an exhaust vent, or being heated from direct sunlight. A hot spot present during both daytime and nighttime would suggest that there is a station that is left indoors, while a hot spot only present during the day indicates a station has been left outdoors, but is affected by direct sunlight. Figure 6 demonstrates qualitatively the removal of daytime anomalous temperatures by each of the quality control schemes. Due to the interpolation used, there are some areas with missing data (such as Shetland and the Western Isles). The Meier and Nipen schemes both perform similarly, by removing the most extreme hot spots from the data, while the Clark scheme has the smoothest texture of the three schemes.
Qualitatively, all three quality control schemes reduce most of the hot spots in the data. However, there are a few hot spots remaining in the Clark map for midday ( Figure 6), which are removed in the Meier and Nipen plots. Figure 7 shows the interpolated Netatmo temperatures for the United Kingdom at midnight on 2 April 2020. Again, there are significant hot spots in the unfiltered data where Netatmo sensors are recording unrealistically warm temperatures. There are more hot spots in the unfiltered data for the midday temperatures compared with midnight, 12 h later. This indicates that the majority of stations with anomalous temperatures are being affected by incoming solar radiation, while others are likely to be located indoors. The Meier map in Figure 7 shows some warmer pools in the midnight data, F I G U R E 6 Midday interpolated temperatures on 1 April 2020, with the unfiltered data and the quality controlled data plotted for each of the three schemes. The QC schemes remove or reduce the majority of hot spots in the unfiltered data. There are data from 4933 stations plotted. which may be evidence of a UHI effect in cities, while the Nipen and Clark maps show some variation throughout the country. Figure 8 shows statistical metrics (mean, SD, skewness and kurtosis) for the differences between the gridded Netatmo and point Met Office data, by hour of the day, for April 2020 in southeastern England (as defined in Table 5). This restriction is to ensure that there are Netatmo stations within a reasonable distance to reference stations so that there are no large errors due to the interpolation over data-sparse regions such as the Highlands and Islands of Scotland, where there are large distances between some Met Office sites and the nearest Netatmo station (as shown by Figure 1).
The mean temperature plot shows a diurnal effect in the unfiltered data with the warmest bias during the afternoon, where the Netatmo stations are recording a warmer temperature than the Met Office: most likely because there are Netatmo stations positioned in direct sunlight or indoors. Indoor measurements are likely to have a greater impact on the temperature difference overnight. The stated accuracy of the Netatmo temperature measurements is ±0.3 C, and the mean temperature is in excess of 0.3 C for each hour of the day for the unfiltered data, the Meier and the Nipen schemes. In the unfiltered data and all the quality controlled data, there is a decrease in the warm bias during the morning, between 0000 and 07:00 UTC. Only the Clark quality controlled data have mean biases less than 0.3 C, between 07:00 and 12:00 UTC. If the Netatmo data set were temporally shifted to account for any lag, this may make the warm bias more uniform throughout the day but would not remove the warm bias. However, the decrease in the warm bias during the morning is likely because of the F I G U R E 7 Midnight interpolated temperatures on 2 April 2020, with the unfiltered data and the quality controlled data plotted for each of the three schemes. Again, the QC schemes remove or reduce the majority of hot spots in the unfiltered data. There are data from 4931 stations plotted. lagged response to changes in temperature, as noted in Section 4.1. The change in SD between the quality control schemes is indicative of their effectiveness, but the actual SD value has a fundamental minimum due to the approach used here of gridding Netatmo stations onto Met Office sites. The interpolation does increase the spread of the data in Figure 8 because of the natural variability in the interpolation to a new point. If Netatmo stations were co-located with the Met Office sites, then the SD in Figure 8 would be smaller. F I G U R E 8 Hourly mean, SD, skewness and kurtosis of the temperature anomaly distribution between interpolated Netatmo data and Met Office data in southeastern England (as defined in Table 5). The unfiltered data and quality controlled data using the three schemes discussed are plotted.
The overall statistics for the temperature differences between the interpolated Netatmo data (unfiltered and the quality controlled data) and Met Office data are shown in Table 4. As shown in Figure 8, the SD of the temperature differences in the unfiltered data is at its maximum during the afternoon, at the same time as the peak mean difference in temperature. The QC schemes reduce the SD of the warm bias, most notably during daylight hours. This indicates that the schemes remove data from stations making observations in direct sunlight. The warm biases are positively skewed, with the unfiltered data notably skewed during daylight hours, indicating that solar heating is occurring. All three schemes reduce the skewness of the warm bias during daylight hours, although the biases remain positively skewed in all cases except for late evening in the Meier scheme. The kurtosis of the quality controlled data varies throughout the day, but does not exceed 3. This indicates that the temperature differences, while positively skewed, have fewer outliers than would be expected if the differences were normally distributed. The root mean square error of the unfiltered temperature differences is 2.66 C, which was reduced (but still in excess of 2 C) by the three QC schemes.

| Lag time
The mean temperature difference between the Netatmo and Met Office observations decreases during the morning. The warmup and cooldown experiments with and without the temperature sensor casing performed by Büchau (2018) demonstrated a marked decrease in the time constant when the Netatmo cover was removed, with the lag time decreasing from 26.89 (22.46) min to 15.40 (14.02) min in the warmup (cooldown) tests. This is consistent with the hypothesis that the lagged response to temperature changes by Netatmo sensors is because of the casing. The lagged response may explain the decrease in the mean temperature difference during the morning in Figure 8.
One hypothesis is that the Netatmo temperature sensor experiences a lag when the air temperature changes (as shown by Figure 3 and the tests in Section 4.1). As the ambient air temperature increases after sunrise, the Netatmo temperature sensor records an artificially low temperature because of this lag, yet still higher than the reference observations because of the general warm bias. There is variation in the reduction in the warm bias between different days that is hypothesized to be due to the changing intensities of solar heating, although the evidence was not found in this study to support such a hypothesis. Future studies should examine the Netatmo biases with co-located radiance measurements to answer this hypothesis.
In Edinburgh, there is a Netatmo station located approximately 250 m away from the Met Office surface station. This Netatmo station is just one station out of the network, so may not be representative of the placement uncertainty observed across the whole crowdsourced T A B L E 4 Summary of the statistics for Netatmo and Met Office temperature differences during April 2020, for the unfiltered data and each QC scheme network. Figure 9 shows the mean hourly air temperature during all hours in April 2020 for both stations. While similar air temperatures are recorded, the difference between the mean Met Office and Netatmo observations decrease in the morning, between 06:00 and 12:00 UTC. The decrease in the temperature difference may be explained by the rate of solar heating being maximum during the morning coupled with the lagged response of Netatmo observations. Since this warm bias was present throughout the entire day, and not shown during the PTUCal chamber tests (Section 4.1), this anomaly must be positional. Therefore, studies using Netatmo data for quantification of UHI effects (such as Chapman et al., 2017) must be aware of the fine line between a UHI effect and observations that are artificially warm due to their placement.

| DISCUSSION
In the United Kingdom, there is a relatively dense network of reference Met Office stations. In many countries however, observation networks are less dense, so the potential benefits of crowdsourced observations are greater. For example, the Oklahoma Mesonet in the United States has an average station spacing of 29 km (Ziolkowska et al., 2017). The Met Office surface stations in the United Kingdom, with an average spacing of 17.6 km, are at a greater spatial resolution than the Oklahoma Mesonet. Therefore, Netatmo sensors will add the most value to locations with sparse surface observations, and little infrastructure to maintain costly surface stations. Table 5 shows the density of Netatmo and Met Office stations in the United Kingdom, as well as in southeastern England, and northern Scotland. The distance between Netatmo weather stations is short enough to theoretically enable observations of meso-γ phenomena that the Met Office surface sites may miss. However, the UK Netatmo temperature observations, even those that have been quality controlled, still exhibit mean warm biases in excess of 1 C. The mean distance between Netatmo stations in northern Scotland is almost 3.5 times the mean distance for stations in the southeast of England. The spread between Met Office stations in northern Scotland is only 1.3 times the spread in the southeast of England. Hence, Netatmo observations are more likely to add value to forecasts around urban areas. The 17.6 km spacing of Met Office stations means that there are large gaps that crowdsourced observations could fill. The spread of Netatmo stations around the United Kingdom is not even, and there are far more stations, densely packed, in the southeast of England than in northern Scotland, as demonstrated in Table 5. If an appropriate quality control scheme is applied, then Netatmo observations may prove effective at forecasting UHI effects in urban areas.
The isolation test from the Nipen et al. (2020) quality control scheme removes stations that are outlying in very rural areas. The radiation correlation technique from the Meier et al. (2017) scheme is useful to remove stations influenced by solar heating, and the section to remove stations with identical latitude and longitude is important, since otherwise, the location of a Netatmo sensor may be completely incorrect, adversely affecting observations. All T A B L E 5 Comparison of the distances in km between the nearest Netatmo stations for different QC schemes and also the distances between the nearest Met Office stations in the United Kingdom the schemes examined here improved on the warm bias present throughout the data, but a scheme that fully addresses the lagged response to temperature changes (discussed in Section 4.3) may prove more effective at removing some of the warm bias in the data. However, since the bias is present throughout the entire day (not just during daylight hours), a stronger approach to removing anomalous data from Netatmo stations is required in a quality control scheme, perhaps by combining features from all three schemes. The hot spots shown in the maps in Figures 6 and 7 are larger than any individual mesoscale effect that may be present (such as urban heat islands). Drawing conclusions from this data must take into account warm biases due to station placement. Although the three quality control schemes discussed in Section 3.3 did remove the hot spots, there still remained a warm bias in the data throughout the day. However, this warm bias was not present in the results of the PTUCal chamber tests, presented in Section 4.1. Hence, there must still be some results from incorrect placements in the data, despite the largest deviations being removed. Quality control schemes, while effective at removing perceived hot spots in the data, could be stripping out genuine variations in temperature, especially where direct comparison to reference data is made by the scheme (such as in the Meier and Clark schemes). The Meier scheme was originally used for a single city and so using all stations to compute averages may not be entirely appropriate when considering an entire country, as in our case. The linear bias correction used in the Clark scheme assumes that the bias between the two instruments is also linear over time. Figure 9 shows that the bias is non-linear over the diurnal , F I G U R E 1 0 Schematic describing various meteorological phenomena, adapted from Orlanski (1975). The blue box represents events that Met Office surface stations can currently observe, and the red box represents the additional features that Netatmo sensors may be able to resolve once anomalous stations are removed. Note that these classifications are not strict.
cycle, due to the significant lag time (and possibly also solar heating) that the Netatmo suffers from. The result is that the Clark linear bias correction does not fully compensate for the difference between the individual Netatmo measurements with a 6-hourly bias correction. A shorter time window, such as 1-or 3-hourly bias correction, would take into account more of the non-linearity of the diurnal bias between the Netatmo and the Met Office stationsbut may then suffer locale biases if the two sites are not co-located. In addition, the lagged response to temperature changes by Netatmo sensors will result in any quick increase then decrease in temperature being severely smoothed in the Netatmo data. The chamber testing in Section 4.1 shows that the Netatmo stations are slow to react to a rapid change in temperature, which Clark et al. (2018) showed can occur in the United Kingdom with an example of a 3.2 C temperature change in 8 min. Note that the Netatmo sampling time of 5 min would be inadequate in that scenario. A lag time of 12.7 min to record a 63% decrease in temperature is too coarse to resolve the finest mesoscale (meso-γ) and for any microscale phenomena. The significant lag time, and to a lesser extent the sample frequency of 5 min, reduces the potential impact of using Netatmo observations to capture events not currently observed by traditional meteorological sensor networks.
From this work, a schematic describing scales of meteorological phenomena (Figure 10) has been constructed. It shows features that a properly calibrated and quality checked network of Netatmo weather stations are capable of resolving, compared with the Met Office observation network. While Met Office observations every minute are available directly through the MMS, Netatmo stations only make an observation every 5 min. However, the long lag time observed in Netatmo temperature readings, as shown in Section 3.1, means that Netatmo stations are not capable of resolving phenomena on a 5-min timescale, since the mean temperature lag time was 12.7 min in the chamber. Figure 10 shows Netatmo stations make sufficient observations and are close enough together to be able to resolve urban meteorological effects, albeit on timescales longer than 15 min.
The observations studied here are from just 1 month, April 2020, which was the sunniest April on record in the United Kingdom (Met Office, 2020a). An analysis of the observations from Netatmo stations over a longer time period would prove useful to investigate how the warm bias differs throughout the year. With April 2020 being as sunny as it was, the Netatmo temperature sensors may have been influenced by solar heating more than during the rest of the year, potentially contributing to the warm bias observed in the data.

| CONCLUSION
The validation and evaluation of temperature observations from Netatmo weather stations in this study means that recommendations can be made on how to use data from Netatmo stations. Netatmo observations are more likely to add value in urban areas than rural areas, where anomalies are more obvious and meso-γ phenomena are more likely to be observed due to the higher density of observations. Netatmo stations observe more than temperature, so an investigation into how other variables compare to a trusted network of observations may prove more successful, and may not experience the lag time that Netatmo temperature measurements do. However, there remains uncertainty within the observations because of their placement. As noted above, the time period of this study was the sunniest April on record in the United Kingdom, and the resultant solar heating influences on Netatmo temperature observations can be seen in the data in Figure 8. The unfiltered data show a large SD and positively skewed data during daylight hours, which are mostly removed by the quality control schemes. This shows that the QC schemes do have a good effect on the data, but an overall warm bias remains in the quality controlled data throughout the day.
Additionally, the lagged response to changes in temperature discussed in Section 4.3 means that Netatmo data are unsuitable for observing temperature on a timescale of less than 15 min because any phenomenon that lasts for less than 15 min would be severely smoothed in the temperature observations. Netatmo stations are popular in the United Kingdom ( Table 1 shows that 6098 UK Netatmo stations were releasing data publically during some of April 2020), and any competing sensor would take time to achieve a similar number of stations to Netatmo ones. One option to help improve trust in Netatmo observations would be to provide more guidance to owners of Netatmo stations on how best to place their stations outside: currently, there is little information provided within the instruction booklet. Alternatively, other quality control schemes could be tested, which also account for the lagged response in observations. Lastly, other sensors or accessories may be produced to account for biases, such as a Stevenson screen for the temperature sensor to reduce solar heating effects. The lagged response to temperature observations by Netatmo sensors is unfortunate, as it becomes difficult to account for sensor biases. Technological trends change over time, and a new popular sensor may be invented, which observes more frequently than at 5-min intervals, that is less affected by incoming solar radiation and suffers little lag in its observations. The meteorological community will need to keep up with current trends within consumer weather stations when deciding how to get optimal data from crowdsourcing. Long-term drift is a concern for consumer weather stations that do not receive regular recalibration like national meteorological service stations do. To test this, a future study could either keep a station on a site for many years and then redo a calibration chamber test, or examine a historical data set of public Netatmo stations and compare the newer stations with the old. Future evaluation of the QC schemes should refine the gridding and interpolation to only resolve a reference station if there are sufficient nearby Netatmo stations.
There is great potential offered by crowdsourcing meteorological observations to supplement existing surface observations, especially in parts of the world that have surface stations at greater distances than those used by the Met Office in the United Kingdom (17.6 km on average). As the resolution of numerical weather prediction (NWP) models increases, a denser network of observations will be required if processes on the scale of 1 km need to be resolved, and crowdsourcing observations are an already existing set of observations to fill the gap. The potential offered by crowdsourcing observations, from home weather stations as discussed here, or vehicle-based observations (Z. Bell et al., 2022;Mahoney & O'Sullivan, 2013), mean that there is a wealth of highresolution surface data to help forecasting centres address the challenges of observing and modelling weather into the future.
Amateur meteorologists have kept valuable records from their personal weather stations for decades, yet it is the invention of 'smart home', internet-connected products that has allowed their observations to be shared in real time. The number of household objects, not just home weather stations, with the capability to automatically upload data to the Web is increasing. Automation of society is becoming more commonplace, and relies on local measured data for contextual awareness. It is unlikely that consumers will stop owning such 'smart' products. Sensors are getting smaller, cheaper and more connected, so it is likely that forecasting centres will continue to have a wealth of crowdsourced data available to them through the inclusion of sensors in consumer products: for example, many smartphones contain barometers (Hintz et al., 2019). Continued improvements to computing power will increase the ability of forecasting centres to synthesize large volumes of data through assimilation; the successful advent of satellite data assimilation being a prime beneficiary thus far. Finally, the resolution of NWP models is perpetually increasing, and will require high-resolution observation networks that are uneconomical for budget-constrained national meteorological services to deploy and maintain. Crowdsourced observation networks show some promise to be a scalable solution. Given the refinement of sensor technology and schemes to effectively assimilate crowdsourced data, these observations will increase in quality and therefore in usefulness and economic value.