Millions of digitized historical sea-level pressure observations rediscovered

Millions of sub-daily sea-level pressure observations taken between 1919 and 1960 over the British and Irish Isles were transcribed from paper records in the early 2000s but were not published and subsequently forgotten. A chance discussion led to the rediscovery of the transcribed data and 5.47 million observations from 160 locations are now made available, although the data have not been fully quality-controlled. Much of the data are 3-hourly, allowing for detailed examinations of synoptic weather variations for this region and time period


| INTRODUCTION
In 1887, John Venn used daily observations of atmospheric pressure from a single weather station in the United Kingdom to demonstrate that the natural world could exhibit non-Gaussian behaviour.Specifically, he highlighted that low pressures had a longer tail than high pressures (Venn, 1887).The significant impacts on society that can result from both extreme low and high pressure events emphasizes the importance of better quantifying the variations of atmospheric pressure on timescales from hours to decades.Long records of atmospheric pressure will help monitor whether the properties of such events are changing as the world continues to warm (e.g.Allan et al. 2009;Woollings et al. 2018).
Measurements of surface or sea-level pressure also allow a picture to be constructed of the variations of the atmosphere from the surface upwards and, due to the relative simplicity of the measurement, this quantity has been observed for centuries.These pressure observations enable a detailed reconstruction of the variations in the weather as far back as 1836 (the 20th Century Reanalysis; Slivinski et al. 2021), in contrast to reanalyses of the modern era which use a much wider range of observations, but only extend back to about 1950 (e.g.ERA5; Hersbach et al., 2020).However, huge quantities of meteorological observations, including of atmospheric pressure, remain unavailable to science as they are still only recorded on paper in various archives or, if we are lucky, in scanned images of the original documents.Many recent projects have added to our databases of surface pressure observations (e.g.Freeman et al. 2016;Ashcroft et al. 2018), including through citizen science activities (e.g.Hawkins et al. 2019;Craig & Hawkins 2020), but gaps remain.It is essential to ensure that the data from any new transcription of observations to digital formats becomes widely available for use and scientific analysis.
This study briefly discusses a set of more than 5 million sea-level pressure observations which were manually transcribed from paper documents many years ago but were subsequently forgotten and remained unpublished until a fortuitous scientific discussion led to the rediscovery of the data.The necessary tasks to make the data available have now been completed.

| The Daily Weather Reports
The UK Met Office produced a Daily Weather Report (DWR) every day from September 1860 until 1980.These DWRs contain detailed weather observations, mainly taken around the British and Irish Isles, but also including data from stations across Europe and beyond; these were transmitted to the Met Office each day, initially by telegraph.Scanned copies of the DWRs are now all online and freely available (Met Office Digital Library and Archive, 2022) and are a valuable source of historical weather data.For example, the temperature, rainfall and pressure observations from the 1900-1910 DWRs were recently digitized using citizen scientist volunteers (Craig & Hawkins, 2020).The format and information contained within the DWRs changed many times, from once-perday observations taken at around 15 stations in the 1860s, to twice-per-day observations taken at around 50 stations in the early 1900s, and six-or eight-times-per-day observations taken at tens of stations in the 1920s onwards.Several stations are included for large fractions of this period (see Section 3.1).
Figure 1 shows an example DWR page from 5th April 1919, showing the stations from which eight sea-level pressure observations per day can be derived.Each station has a listing for 01Z, 07Z, 13Z and 18Z, with a pressure observation converted to sea-level (given to a precision of 0.1 mb) and a change in pressure over the previous 3 hr.This allows the pressures for 22Z, 04Z, 10Z and 15Z to be calculated, but with a small uncertainty as the change is only given with a precision of 1mb.Note that the rows are not always complete, highlighting missing data, especially for 01Z, and therefore also for 22Z the day before.
Figure 2 shows another DWR page showing observations from 28th December 1960, listing many stations with complex codes describing the various weather variables recorded.These variables include pressure observations every 6 hr and a pressure tendency over the previous 3 hr, allowing 3-hourly data to be similarly produced.After 1945, the times of observation included were a regular 3-hourly schedule (00Z, 03Z, 06Z, 09Z, 12Z, 15Z, 18Z and 21Z).

| Transcription and dataset rediscovery
In the early 2000s, the Met Office funded a commercial company to undertake the transcription from the paper records of the 3-hourly pressure observations taken at the British and Irish stations contained within the 1919-1960 DWRs.This transcription project did not include other types of weather observation or any data from the many 'foreign' stations also listed in the DWRs; those data, consisting of many millions of individual weather observations, remain largely unrescued.In addition, for 1919-1921, there is a separate section listing a group of around 25 UK and Ireland stations with observations four times per day, but these were not transcribed (around 70,000 observations).After transcription, the resulting dataset was partly processed, but not published or included in any national or international databases.
One of the main reasons the dataset production process was not completed in the early 2000s was that the 'additions and corrections' component of the DWRs, published once per month until the 1930s, were not transcribed.This is because the original paper copies of the DWRs had not been scanned at that time and could not be sent out for digitizing; the duplicate copies that were sent did not include the 'additions and corrections' section.All the DWRs are now scanned and online (Met Office Digital Library and Archive, 2022).This means that any observations that were initially missing (but subsequently arrived after the daily publication of the DWR) could not be included in the dataset without considerable additional transcription.In addition, any observation found to be transmitted or written on the DWR incorrectly could not be corrected in the dataset.It is a significant undertaking to complete these quality-control tasks and the resources were not available at the time.The dataset was then largely forgotten.
In 2019, author EH gave a seminar, describing the recovery of the observations contained in the 1900-1910 DWRs using citizen science (Craig & Hawkins, 2020).Author LVA was in the audience and recalled leading the project at the Met Office many years earlier to digitize pressure data contained in the DWRs.Subsequent discussions led to the fortunate rediscovery of the 1919-1960 DWR pressure data on an old laptop owned by author F I G U R E 1 An example page from the DWRs for 5th April 1919 showing the locations with observations four times per day, but which also include a change in pressure over the previous 3 hr (columns 1 and 2 in each of the four sections).Note that the names of the locations are repeated for the top and bottom halves of the table

LVA. The data have now been reprocessed into Station
Exchange Format (SEF) files so that they can finally be included in updates to the International Surface Pressure Databank (ISPD, Compo et al. 2019) and developing databases such as C3S and GHCN-h.

| Quality control
The issues around the quality-control of the data remain, that is, that the 'additions and corrections' are not applied.It is the view of the authors that this issue is less relevant now than when the dataset was initially transcribed, mainly due to the subsequent development of centennial reanalyses such as the 20th Century Reanalysis (20CRv3, Slivinski et al. 2021).Such reanalyses will be a major user of the data, especially as 20CRv3 only assimilates historical pressure observations to produce dynamical reconstructions of past weather variations.These scientific developments mean that pressure data are more valuable to climate science than previously, making it important to ensure data availability.In addition, the reanalysis assimilation process naturally down-weights or rejects observations that are likely to be erroneous.
The issues that will exist with small fractions of the dataset include: 1. Missing data which were never taken, or not transmitted to the Met Office 2. Late arriving data which are in the additions pages but not added here 3. Errors which are listed in the corrections pages but not corrected here 4. Measurement or writing errors which were not identified at the time 5. Errors made during the modern transcription from the hand-written sheets 6. Systematic biases F I G U R E 2 An example page from the DWRs showing observations for 28th December 1960 for the locations with pressure observations (columns PPP) four times per day, and which also include a barometric tendency (columns app) for the change in pressure over the previous 3 hr.These observations are shortened using specific codes described in the DWRs Some observations within (1) could be found in other sources such as the original logbooks of the station, if they can be located.In principle, issues (2) and (3) could be addressed with a significant time investment using the 'additions and corrections' pages; targeted efforts to address such issues for certain significant weather events may be worthwhile in future.Even if this process was undertaken, errors of types ( 4) and (5) will still exist, and these are harder to find and potentially more numerous.Examining individual timeseries for 'jumps' in pressure (inhomogeneities) may allow some of these errors to be identified through manual checking of the original DWRs; for example, this time-consuming process was performed by Alexander & Power (2009) for a station in Australia.Examples of probable type (4) errors are shown later, along with examples of type (2) missing observations.Systematic biases (type 6) could exist due to, for example, an incorrect station elevation being used for the correction to sea level, or an incorrect calibration.
The use of these data within a reanalysis framework could pick up many of the errors by flagging individual observations which are rejected by a reanalysis assimilation process, but that is only possible after all the observations have been added to databases such as ISPD.An example of this identification for a type (5) issue from a different dataset is discussed in Craig & Hawkins (2020).Systematic biases are also estimated and removed when pressure observations are assimilated in 20CRv3 to account for type (6) issues.
For these reasons, the authors believe it is far better to produce a pressure dataset that is (say) 97% correct, 2% missing and 1% erroneous than no dataset at all, even though we cannot be certain about the percentages that are missing or erroneous.We note that some users of the data may want to undertake their own quality-control procedures depending on the application.4 shows the locations where sub-daily pressure data are available in this dataset during example years between 1919 and 1960, and Figure 5 labels the locations mentioned in the text.We note that we have not identified precise coordinates for every station due to lack of metadata available F I G U R E 3 Number of stations for which observations have been recovered, and the average number of available observations per day in each year, for both ISPDv4 (existing; orange) and this dataset (blue).There is a considerable overlap between stations in the 1929-1939 period, although higher frequency observations are available in this new dataset on the DWR sheets.But, given that pressure is relatively insensitive to the site details, we do not consider this to be a serious issue.

| Observations and locations
Figure 3 also shows the number of stations and observations available for the United Kingdom and Ireland in the same period from ISPDv4.This dataset will increase the total number of observations available considerably.For the 1929-1939 period, there is overlap between the two datasets, but this dataset has higher frequency data from the stations that are already in ISPDv4.

| Example of observations available in 1928
To highlight the ability of this dataset to provide information about synoptic variations, Figure 6 shows all the observations during October and November 1928.This was a stormy period, with at least three low pressure systems (<960 mb) moving over the British and Irish Isles.The top row of Figure 7 maps the individual observations at the peak of those storms with darker blue colours used to denote lower sea-level pressures.
Note there are missing data for some of the times shown in Figure 7, that is, there are stations which are reporting for some events but not others.This is frequently seen around historical severe storms which caused delays to the transmission of the data.Some of these missing observations will be in the monthly 'additions' pages of the DWRs, and some appear written in red ink on the online copies of the DWRs, but these were not transcribed because the duplicate copy of the DWRs being used the earlier stage of the digitization process did not include them (see Section 2.2).For example, the missing observation at Eskdalemuir in southern Scotland at 15Z on 23rd November is 946 mb, with other missing observations in Ireland from Malin Head at 968 mb and Blacksod Point at 982 mb.Recovering such individual missing observations may be worthwhile if analysing case studies of particular severe storms.this is indicated to be minus 16 mb from 3 hr earlier, resulting in a 991 mb observation for 15Z (Figure 8).It seems highly likely that the handwritten '−16' should be '+16', and that the 15Z observation was actually 959 mb, rather than 991 mb; this would fit the other available observations of the synoptic situation.There will be other examples such as this in the dataset, but they would likely be rejected in a reanalysis assimilation.This is an example of issue (4) listed above and suggests that the data at times derived from both a transcribed observation and a change in pressure will contain more errors.The 958 mb at Inchkeith at 15Z on 23rd November also looks too high but is similarly transcribed correctly with no correction reported.
The bottom row in Figure 7 shows isobars for the same times from 20CRv3 (Slivinski et al. 2021) which only assimilates previously digitized pressure observations contained within ISPDv4.Just 16 observations from 11 locations (black dots) were available during each day for this time period (around half at 06Z, and none at 03Z or 15Z).These are enough data to place the low pressure centres in roughly the correct positions, but the severity of the F I G U R E 8 The 18Z pressure observation for Birmingham on 16th November 1928.Note that this is the resolution of the online scanned copies of the DWRs.Higher resolution images exist but are not online storms is not well represented.The ensemble spread (red shades) highlight that the uncertainty can reach ~7 mb for some parts of the domain shown.With ~250 observations per day from ~40 locations now available for the same spatial region, subsequent reanalyses of these events will be much improved with a much reduced uncertainty.The value of having spatially diverse and highfrequency data is clear, especially for storms which cross the country at night when traditional once-or twice-perday climatological observations are not made.

| Observation comparisons
Comparing the locations of observations in the top and bottom rows of Figure 7 highlights that some of the newly digitized data come from places where observations already exist in ISPDv4 for at least some of the period.This allows a comparison of the two different digitizations of the same data to check accuracy.Figures 9 and 10 show comparisons for Eskdalemuir and Valentia, comparing ISPDv4 (grey) and this dataset (blue).
Overall, there is high agreement between the different digitizations.For Eskdalemuir, there are just 35 differences out of 10,790 overlapping observations covering 1929-1939, with an overall standard deviation between the two series of 0.2 mb.For Valentia, there are numerous small differences during 1919-1960, with an overall standard deviation of 0.7 mb between the time series.The standard deviation of differences is larger during 1922-1929 at 1.3 mb.It is unclear why, but we speculate that this could be due to variations in the conversion from station pressure to sea-level pressure in different sources.There could also be some errors in the ISPDv4 version of the data.There are 362 differences out of 94,392 overlapping observations which are larger than 5mb for Valentia.Note that some of the Valentia data are already hourly within ISPDv4 so this dataset will not add much new information for this site.

| SUMMARY
More than 5 million sub-daily sea level pressure observations are made available, from 160 locations around the British and Irish Isles between 1919 and 1960.If using these data, it is important to recognize that some qualitycontrol procedures may need to be applied, depending on the specific use planned.
These data will be submitted to global datasets and therefore be available for use in projects such as future reanalyses of the historical period.Experiments are planned to include these data within dedicated simulations with both ERA5 (Hersbach et al. 2020) and 20CRv3 (Slivinski et al. 2021) reanalysis systems to demonstrate the value of such data for reconstructing the atmospheric circulation during particular extreme weather events.These additional data also have potential to augment time series of pressure for distinct locations in the United Kingdom, to understand storminess via gridded datasets or 'pressure

F
Locations of the rescued data from example years between 1919 and 1960.Each red point represents a location with at least 200 observations during the year Note one almost certainly erroneous observation in the middle panel of the top row of Figure 7.The 991 mb observation for Birmingham (south-east of the lowest pressure values) at 15Z on 16th November 1928 has no correction listed in the DWRs and is correctly transcribed from the original DWR sheets.The 18Z observation is 975 mb, and F I G U R E 5 All locations appearing in the dataset (dots), with those mentioned in the text shown in blue and labelled Sea-level pressure observations for October and November 1928, highlighting a series of low pressure events over the British and Irish Isles

F
I G U R E 7 (top) Map of locations and sea-level pressure observations from this new dataset for three low pressure events during October and November 1928.(bottom) Maps showing ensemble spread (red shades) and the ensemble mean of sea level pressure (contours) from 20CRv3 for the same times.The black dots indicate stations where some data exist for the same day in ISPDv4 and so were used to produce 20CRv3 Figure 3 summarizes the number of stations and observations included in this dataset.A total of 5.47 million observations are made available from 160 locations covering different time periods.The first data available are in April 1919 and the last data are in December 1960.From April 1921 onwards, more than 40 stations have 3-hourly pressure data available, although this is often actually six times per day with 22Z and 01Z missing.From December 1943 to December 1948, around 70-80 stations are available, before this number drops to around 60. Three locations (Eskdalemuir, Valentia and Aberdeen-Dyce) have largely complete data for the whole time period with several other stations largely complete from April 1921 onwards.Figure