ANCHORS: A multi‐decadal tide gauge dataset to monitor Australian relative sea level changes

Here, we describe a new national tide gauge‐based sea level dataset for monitoring sea level changes around Australia, and the novel homogenization methodology used to develop it. Homogenization is a two‐step process that involves the detection of jumps or steps in the data (inhomogeneities) followed by a correction applied to remove the change. This new dataset is called the Australian National Collection of Homogenized Observations of Relative Sea Level (ANCHORS). ANCHORS provides coastal sea levels at hourly resolution with homogenization performed on annual means. ANCHORS is intended to be a national tide gauge‐based sea level dataset to monitor changes in mean sea level and resultant coastal flood frequency changes around Australia. Whilst sea levels are rising around Australia in response to global mean sea level changes, coastal water levels represent a notable gap in the suite of in situ high‐quality datasets available for researchers, governments and industry. Tide gauge datasets are well suited to impact‐based assessments of changes in flood frequency, duration and risk. In this regard, ANCHORS will sit alongside existing satellite‐ and numerical modelling‐based sea level datasets to provide enhanced insights into the physical impacts of sea level rise on Australian coastlines. We have adapted established homogenization techniques to develop a national collection of homogenous long tide gauge records. The dataset comprises 38 tide gauge records from around Australia, varying in length from 34 to 123 years, with an average of 57 years. Finally, we discuss limitations and potential applications for ANCHORS and how these have informed the design of this coastal sea level dataset.


| INTRODUCTION
Global mean sea level is rising due to the melting of land ice, thermal expansion of sea water and changes in the amount of water stored on land. Hundreds of millions of people are expected to be impacted this century, with economic costs estimated to be greater than US$10 trillion (Kirezci et al., 2020;Kulp & Strauss, 2019). Understanding how sea level has changed over time is an important component of managing future flood risk, and in testing and calibrating projections to inform future decisions (e.g. Idier et al., 2020). Whilst updated estimates of geocentric sea level rise in the Australian region have been published recently (Karimi & Deng, 2020;Watson, 2020), the most recent estimates of Australian coastal sea level change, measured with reference to local land levels, are from White et al. (2014). It is this so-called relative sea level that is of primary importance for the study of coastal impacts as it includes non-oceanographic factors that can mitigate or exacerbate coastal flood risk (Karegar et al., 2017;Nicholls et al., 2021).
The aim of this study is to develop a national collection of tide gauge records to monitor sea level changes and their impacts in key population centres. Primarily, this means that the dataset must comprise records with enough years to robustly detect mean sea level trends, have sufficient spatial coverage to assess changes nationally and be free of inhomogeneities. In this context, inhomogeneities are sources of error due to factors other than the geophysical conditions experienced at the tide gauge. This aim is not fulfilled by existing sea level datasets and global repositories such as the Global Sea Level Observing System (GLOSS, Bradshaw et al. 2015), Permanent Service for Mean Sea Level (PSMSL, Holgate et al. 2013) and the Global Extreme Sea Level Analysis (GESLA, Woodworth et al., 2017), which have not had homogenization techniques systematically applied. Other datasets like the Australian Baseline Sea Level Monitoring Program (ABSLMP, Bureau of Meteorology 2021) and the University of Hawaii Sea Level Centre data sets (UHSLC, Caldwell et al., 2015) are also of insufficient temporal and/ or spatial coverage.
This dataset aims to quantify relative sea level changes due to all possible geophysical drivers of variability (e.g. as described by Woodworth et al., 2019) on annual timescales and length scales of 10s to 100s of kilometres. Tide gauge observations are more suitable than satellite altimeter data for this purpose, as satellite instruments cannot detect the effects of all drivers of coastal sea level change. However, tide gauge records are prone to many sources of inhomogeneity. These primarily occur when two or more shorter discrete tide gauge records are concatenated to create a longer record and the two or more records have differing instrumentation, geographical location (i.e. different hydrodynamic environments), definitions of tide gauge zero (i.e. datum shifts), or data collection processing procedures (e.g. measurement precision). Additionally, quality control procedures vary from site to site and over different time periods at a particular site due to changes and differences in technology and how this was adopted by different agencies. To ensure a dataset is homogeneous (free of inhomogeneities), we must identify and remove these non-geophysical changes in tide gauge records. This is of paramount importance for applications in climate change monitoring and coastal planning.
Homogenization is typically a two-step process that involves the detection of jumps or steps in the data (i.e. inhomogeneities) followed by a correction applied to remove the change. Whilst there have been various international attempts to improve tide gauge datasets by employing methods beyond standard quality control (e.g. Hannah & Bell, 2020;Hogarth et al., 2020;Marcos et al., 2021;Talke et al., 2018), there are not presently any homogeneous collections of Australian tide gauge records for coastal sea level monitoring. Here, we adopt a 'buddy checking' approach (Hogarth et al., 2020;Marcos et al., 2021) on a national collection of tide gauge observations. A key component of homogenization is utilizing metadata to identify potential sources of inhomogeneity that have occurred in the past. In this context, we define metadata as consisting of tide gauge information documenting calibrations, station summaries, station histories and geodetic levelling reports, as well as communication between operators and data custodians. From these, we can infer information on potential sources of inhomogeneities, such as equipment upgrades and site moves. It is likely that this lengthy process has historically been a barrier to developing large homogeneous sea level datasets, despite the abundance of otherwise suitable relative sea level data available in national and international repositories.
Finally, we discuss limitations and potential applications for ANCHORS and how these have informed the design of this coastal sea level dataset.

K E Y W O R D S
ANCHORS, coastal flood risk assessment, homogenization, sea level rise, tide gauge data Here, we describe the data and methodology behind Australian National Collection of Homogenized Observations of Relative Sea Level (ANCHORS), a homogenized set of tide gauge records with hourly still water level observations. The dataset comprises 38 long tide gauge records from around Australia, varying in length from 34 to 123 years, with an average of 57 years. Whilst numerous individual locations have earlier data, for most applications, ANCHORS can be considered to have a start date of 1966. Importantly, the homogenization is performed at the annual timescale, by comparing annual means at ANCHORS gauges to annual means at other well-correlated gauges. This reduces all observations to present-day tide gauge zero (TGZ). This is analogous but not identical to the Revised Local Reference process used by PSMSL (Holgate et al. 2013). As all observations are related to the present-day TGZ the established connections between this level to specific survey benchmarks, tidal planes, ellipsoidal heights or legal data can be applied in the same way as they would be for unhomogenized data (including subsets made available in global repositories) without additional reprocessing. Such datum information is available at the PSMSL or in various tide gauge history documents referred to in Section 2.2.2. Therefore, identifying any additional datum connections is beyond the scope of the present study as it has been shown by previous studies (i.e. those that used unhomogenized data) that TGZ is a suitable reference level from which coastal flooding can be studied (Hague et al., 2019;Sweet et al., 2018).

METHODS
There are many tide gauge records of varying length held by the Bureau of Meteorology ('the Bureau'). The Bureau does not operate all these tide gauges and the collection includes data supplied by agencies including port authorities and other government entities. Additional records are held in the repositories of these external agencies. Whilst most of these tide gauge observations were collected primarily for the purposes of generating tidal predictions at ports, many also represent long-term records of relative sea levels at their respective locations. As such, subsets are made available through GLOSS (Bradshaw et al. 2015), PSMSL (Holgate et al. 2013), GESLA (Menendez & Woodworth, 2010;Woodworth et al., 2017) and UHSLC (Caldwell et al., 2015) for research use. These datasets are used widely in global sea level research, including in the extreme sea levels section of IPCC reports (Oppenheimer et al. 2019). Through the homogenization of tide gauge observations, we seek to develop an improved collection of long-term sea level records from which research on Australian sea level rise, and resulting coastal flooding changes, can be conducted.
Selecting the tide gauge sites to be included in ANCHORS was achieved through semi-objective criteria that optimize spatial and temporal coverage, maintain a manageable density and maximize data completeness. A starting point for ANCHORS was the 31 Australian tide gauges with at least 40 station-years of observations. However, this resulted in an uneven spatial distribution, with many sites in the states of South Australia and Western Australia and fewer sites elsewhere. Therefore, as we wanted the dataset to be national and representative of annual sea level changes and variability nationally, we needed to include additional sites to increase spatial density of the network. A further motivation was to investigate the human impacts of this sea level rise through changes in coastal flood frequencies (e.g. Hague et al., 2019). Hence, we wanted to include population centres that are known to experience coastal floods. This is a limitation of the ABSLMP as it does not include tide gauges from the cities of Sydney, Melbourne, Brisbane, Gold Coast or Adelaide.
To address these requirements, seven further tide gauge records were included in ANCHORS. These sites were Eden, Portland, Booby Island, Onslow, Gold Coast, Ballina and Lakes Entrance, with the latter two concatenated especially for ANCHORS (see Appendix for further details). As part of the principle of even spatial distribution, we decided against including both Port Adelaide Inner and Outer Harbor gauges, opting for Outer Harbor as the more complete record. The spatial coverage of ANCHORS is shown in Figure 1, with further gauge information provided in Table 1. Data up to the end of 2019 were included as these were what had been quality-controlled and filtered to hourly values at the time of dataset development. ANCHORS data are hourly with sea levels reported to the nearest centimetre. It is anticipated that the number of sites included in ANCHORS under these criteria will increase as record lengths increase.
Whilst existing data have undergone basic quality control, their homogeneity has never been assessed in a systematic fashion. Without this homogeneity assessment, extreme care must be taken when conducting trend or extremes analysis using any Australian tide gauge data (White et al., 2014). The broad approach adopted here was to compare tide gauge observations from a location of interest ('base' series, which collectively comprise ANCHORS) to observations from multiple other locations that exhibited strong correlations with the location of interest ('reference' series). This procedure is commonly called 'buddy checking' and is a common feature of advanced quality control used in developing reliable long tide gauge records (e.g. Hannah & Bell, 2020;Hogarth et al., 2020;Marcos et al., 2021;Talke et al., 2018).
Firstly, some short periods of data were removed using a backward completeness algorithm with a criterion of 70% completeness (after WMO, 2017). This involved finding the earliest possible date such that the data completeness from that date forward in time to the end of the data (in this case, 31 December 2019) was at least 70%. The subsequent homogenization process involved five key steps which are described further below, in turn: -Identifying suitable reference series for each base series (Section 2.1) -Identifying potential sources of inhomogeneity and their effects on the data (Section 2.2) -Detection of inhomogeneities in the data (Section 2.3) -Adjustment of detected inhomogeneities (Section 2.3) -Assigning levels of confidence to how likely sections of data are to be homogeneous based on the number of available reference series (Section 3.2). Figure 2 shows the temporal scope of the data after this process, in particular the removal of short periods of early data and assigning confidence. Whilst some sites (e.g. Fremantle, Port Adelaide, Port Pirie, Sydney) have much longer records, the start period for higher quality data at most locations is after 1966. Much pre-1966 data remain undigitized as systematic collection of sea level data commenced in 1966. This marked the beginning of the development and ongoing monitoring of Australia's national vertical reference frame, the Australian Height Datum (AHD). For reasons discussed further in Section 3, 1966 represents the practical start of this dataset for purposes such as monitoring of mean sea level and coastal flood trends.

| Identifying suitable reference series
To assess homogeneity, we required independent reference series to compare to the base series. The first step in identifying suitable reference series was to collate data series with at least 20 years of hourly observations from the various tide gauge operators throughout Australia. Primarily these were from the Bureau of Meteorology database but also from Manly Hydraulics Laboratory (Ocean Tide network), 1 the Western Australian Department of Transport 2 and Maritime Safety Queensland. 3 Throughout this process, we removed sites that were known to be complete or partial duplicates (e.g. a site that was included in both the Bureau of Meteorology database and one of the state-based networks).
The process used to select reference series for each base (ANCHORS) series, using these identified candidate reference series is detailed next. To ensure an internally consistent dataset, we prioritized using ANCHORS sites as reference series for other ANCHORS sites. The first step was to compute pairwise correlations between the annual means of the base site and all other ANCHORS sites that act as candidate reference series, with annual means calculated only for years with at least 70% data completeness. All sites that produced correlations greater than 0.7 were selected as references, with computed correlations shown in Figure 3. If this yielded fewer than five reference series, a second step was taken to identify additional reference series. This second step is effectively a repeat of the first step but with the broader set of Australian tide gauges as candidate reference series rather than just other ANCHORS series. To be considered candidate sites, there needed to be at least 20 years of overlapping annual means with the base series. If this second process yielded correlations greater than 0.7, then the most correlated sites were selected as reference series up to a maximum of five reference series combined between the first and second processes. The reference series chosen for each base series are documented in Appendix (Table S2).
No additional pre-processing is applied to data beyond the routine quality control applied in the curation of hourly observations (Section 2.2.1). No detrending or adjustments for the inverse barometer, climate drivers such as the El Nino Southern Oscillation or vertical land motion are applied as changes in these can result in real changes in relative sea levels (White et al., 2014). This means that correlations derived between reference series could be related to processes with long spatial and temporal scales. This can be seen in Appendix (Table S2) with reference series often separated by large distances. For example, Darwin is a reference series for many locations across southern Australia because there is a strong relationship between atmospheric pressure anomalies there and sea level anomalies across much of the rest of the Australian region (Aubrey & Emery, 1986).

| Potential sources of inhomogeneity and their effects
Homogenization is a process that involves the detection and correction of errors in time series caused by factors other than changes in the geophysical conditions experienced at the tide gauge. Many inhomogeneities result from multiple different tide gauges being concatenated together and archived as a single record, often without sufficient information to identify where data sources or processing methods change. Site moves are where the location of a tide gauge that is used as the data source for the sea level record changes to a new location that differs hydrodynamically from a previous location. Datum shifts are where the vertical reference of tide gauge measurements changes as the location of the gauge, or its instrumentation, changes. Figure 4 shows the effect that datum changes can have on the homogeneity of tide gauge data.
Changes to equipment, which often coincide with site moves or datum shifts, can also lead to inhomogeneities. Tidal staffs, stilling-well gauges, pressure (or 'bubbler') gauges, acoustic reflection gauges and more recently, RADAR gauges have all been used to measure sea levels around coastal Australia (see Pugh andWoodworth 2014, Douglas 2003 and Hunter 2003 for further information). As with all in situ measurements, these gauges are variously affected by environmental, technological or systematic problems. For example, Hunter (2003) described small errors in acoustic gauges associated with temperature dependence that affected the speed of sound and hence the sea level observed. Easton and Radok (1968) and Hamon (1987) provide details on common problems that affected stilling-well gauges and the digitization of their observations. We do not aim to identify or correct for systematic biases in equipment unless changes in equipment coincide with statistically detectible inhomogeneities in the data.
T A B L E 1 Information on each gauge in ANCHORS: Australian National Tide Tables number (ANTT #), start year of record (LC denotes 'lower confidence', HC denotes 'higher confidence'), latitude and longitude, the current gauge operator (Op.), custodian of raw data used in this study, whether unadjusted data are available in the University of Hawaii (U) or GESLA version 2 (G) databases

| Data processing
Changes to data processing are another potential source of inhomogeneities. However, using the threshold of detectability (Section 2.3.2), we find that the effects of this processing on homogeneity are subordinate to those relating to concatenation of multiple discrete sea level records. Nevertheless, it is instructive to consider some of the potential issues identified as these may impact the utility of the dataset in some applications (e.g. analysing storm surges).
The precision of tide gauge data in ANCHORS varies from the nearest millimetre to the nearest 10 centimetres, due to both newer gauges having higher precision, and changes in digitization practices over time. The paper charts produced by the stilling-well tide gauges required digitization to produce tide predictions and calculate trends. The process by F I G U R E 2 Data availability and assigned levels of confidence for ANCHORS stations. Black shading denotes a month with at least 70% data completeness and data assigned as higher confidence. Grey shading denotes a month with at least 70% data completeness and data assigned as lower confidence. Where a month has both higher and lower confidence data, whichever comprises the majority is taken to be the confidence of the month for the purposes of this plot which this was done is not well documented and varies within and between records (Hamon, 1987). For example, there is a period in the mid-to late-1970s where the precision of this digitization was reduced to 10 cm at many locations, rather than the 1 cm of the earlier and later period. This can lead to misestimation of the frequency of coastal flooding, or other threshold exceedances in some circumstances.
We compared analyses using 10-cm precision data (the ANCHORS record) with analyses using 1-cm precision data (discovered and digitized as part of the metadata search process) for Sydney, over the years of 1973 and 1974. This period includes some severe storms (Callaghan and Power 2014). The average absolute error was 3 cm and the errors were unbiased (mean error less than 0.1 mm). The effects on extremes were more pronounced with the digitized record underestimating the frequency of inundation (exceeding 1.96 m, based on Hague et al., 2020) by 7.6% (3 hr less) in 1973 and 6.8% (6 hours less) in 1974. A potential F I G U R E 3 Pearson's R correlations between the annual mean sea levels at ANCHORS stations explanation for this is that the observations of 1.95 m would have been rounded to 2.00 m in the official record (and hence counted as exceeding the threshold) but not in the 1-cm precision record, resulting in more frequent exceedances in the official record than the digitized record.
Periods of missing data can occur when gauges break down or technical issues with data transmission occur. Whilst the present procedure is to leave a gap where data are missing, in the past quality controllers preferred to fill the gaps, either by interpolating residuals and adding onto the predicted tide or using data from a nearby gauge with some sort of correction. A related procedure where a secondary or tertiary sensor (colocated with the primary sensor) is used for gap-filling remains in use today. Typically, such adjustments are made for sections of data no longer than a month, so they are not anticipated to affect annual means, and hence not be detectible inhomogeneities.
Changes in technology have led to changes in how data are collected, transmitted, stored and processed. Older observations, typically pre-1990s, were made on paper charts, then transcribed by hand when digitized, reading values off the chart at a specified interval, typically hourly, or applying some manual averaging (Hamon, 1987). In contrast, modern measurements are made and transmitted electronically and then quality controlled and filtered from 1-, 3-or 6-min frequency to hourly resolution, to conform to the historical measurements. By comparing these modern higher frequency measurements (which are archived) to the official filtered records, we find that the effect of filtering is less than the precision (1 cm) across mean, median and 95th and 99th percentiles. A key implication of this finding is that (top-of-the-hour) real-time observations can be used for threshold setting as they are not statistically different from the filtered hourly values. This means that monitoring and contextualization of coastal inundation events can happen with a reasonable accuracy in real time without needing to wait for the official filtered data to be made available, provided some basic level of quality control and attention to the reporting datum is applied.

| Use of metadata in identifying inhomogeneities
Whilst the homogenization process can occur based purely on statistics (as discussed by Trewin et al., 2020), it F I G U R E 4 Comparison between base series from the Gold Coast, and reference series before (a, upper) and after (b, lower) adjustments to correct inhomogeneities resulting from site moves in 1991 and 1999 is improved if there is supporting documentary evidence available to test whether changes are likely to be real or an artefact of the statistical tests conducted. This documentary evidence is subsequently referred to as metadata and includes gauge history reports, datum and benchmark information, calibration reports and communication between gauge operators and data custodians. These metadata were sourced from the Bureau's paper-based and digital records and additionally provided by gauge operators on request for this project. The new metadata uncovered here extend existing metadata and gauge history compilations by Hogarth (2014), Modra (2013), Tan (2004), Hamon (1987), Blume (1975), Easton (1968), Easton and Radok (1968) and Hamon (1963). For gauges included in ANCHORS, this information has been summarized as tide gauge histories in Appendix.

| Detection and adjustment of statistically significant inhomogeneities
To detect and adjust inhomogeneities, we use independent reference series for each location and use these as comparisons, often termed 'buddy checking' (Figure 4 below). As discussed in Section 2.1, the base series (i.e. the site for which the inhomogeneity detection is conducted) and the reference series used for comparison are often both ANCHORS sites. This necessitated an approach where detection and adjustment are integrated and performed one inhomogeneity at a time ( Figure 5, Section 2.3.4). Figure 4 shows an example of how this adjustment process improves the homogeneity of a sea level record at the Gold Coast. Further details on the adjustments performed for this and other locations are provided in the Appendix (Table S1).

| Detection process
The detection of statistically significant inhomogeneities was conducted for each ANCHORS site. We utilize the RHTests algorithm implemented in R (Wang, 2008;Wang et al., 2007), as part of a larger process as described below. The algorithm runs pairwise penalized t-tests (Wang, 2008;Wang et al., 2007) between the mean sea levels of F I G U R E 5 Flow chart of the combined iterative detection and adjustment processes (Section 2.3.4) employed in the development of ANCHORS. Detection process as discussed in Section 2.3.1 is shaded in blue, adjustment process is blue and discussed in Section 2.3.3 two observational records, accounting for correlation. In this instance, we run RHTests version 4 with the default options, including 95% confidence level for detection, as described in Wang and Feng (2013).
For each ANCHORS record: 1. Use RHTests to run pairwise penalized t-tests between ANCHORS site and each of its reference series (identified as per Section 2.1) using annual means. 2. The number of statistically significant t-test results within every 5-year moving window from (1) was collated, and the mean estimated break size of each of the statistically significant breaks identified in (1) was calculated. 3. Field significance testing using the binomial distribution (Livezey and Chen 1983) was conducted using the fraction of tests that identify statistically significant breaks calculated in (2) and the total number of neighbours with at least 1 year of data in the relevant 5-year moving window. We use a 5% significance level for these tests.
The field significance testing was used to combat the multiple comparisons problem (e.g. Benjamini and Braun 2002), where false positives become likely when multiple statistical tests are conducted simultaneously. It also allowed situations where both positive and negative changes were detected by the algorithm to be resolved, selecting the most significant based on the field significance test results. The use of metadata provides an additional source of validation that an inhomogeneity detected in (1) is not a false positive, and of 33 inhomogeneities detected, 21 were supported by metadata. Finally, a 5-year moving window was chosen so changes that occur mid-year (and hence the effect on annual means is spread across years) are properly considered in the analysis. The detection process is shown in green in Figure 5.

| A minimum threshold of detectability
In some circumstances, supporting metadata are not available for a suspected inhomogeneity. In these cases, we determine a minimum threshold of detectability related to the sensitivity (ability to detect a true positive) of the detection algorithm. The interpretation of this threshold of detectability is that if a mean step size is less than the threshold, then it is possible that the inhomogeneity being detected is an artefact of the statistical analysis (Type I error). The sensitivity can be assessed by imposing artificial inhomogeneities and assessing the ability for the algorithm to correctly identify these artificial step changes. To do this, we increased or decreased 10 years of annual means by varying step sizes (0.01-0.20 m in increments of 0.01 m) and then assessed the frequency at which these artificial inhomogeneities were identified by the detection algorithm (Section 2.3). We used the 10 most complete ANCHORS locations and their five most correlated neighbours, selecting only 10-year periods where an annual mean was defined (i.e. at least 70% complete) in all 10 years. This resulted in 6142 trials across all step sizes.
The results of these simulations and the resultant probabilities of detection are shown in Figure 6. These are the proportion of trials for which the detection algorithm detected an artificial inhomogeneity within a 5-year window centred on the date of the artificial inhomogeneity, for each step size. We found that a 0.05-m artificial inhomogeneity was associated with a probability of detection of 0.625. This means that there is a 62.5% chance that a 0.05-m inhomogeneity will be detected in the presence of other inhomogeneities. Practically, we interpret this as that any inhomogeneity smaller than 0.05 m cannot be reliably detected by the algorithm. This level, 0.05 m, is used as the threshold of detectability and the minimum step size that will be adjusted based on the algorithm alone, without a requirement for metadata supporting the existence of an inhomogeneity. If there are metadata supporting the existence of an inhomogeneity, then it is unlikely that the detection algorithm has falsely detected an inhomogeneity and this threshold need not be considered.

| Adjustment process
A detected inhomogeneity is corrected by adding the mean step size (across all individual statistically significant t-tests) onto data preceding the inhomogeneity. This F I G U R E 6 Proportion of artificial inhomogeneities detected within 5 years of an artificial inhomogeneity, as a function of step size in metres for a step change of 10 years length ensures the data are relative to the present-day datum. This approach was taken for all adjustments. As raw data are not archived, it is impossible to determine if, or how, previously documented step changes have been handled. Considering the result that 0.05 m represents a threshold of detectability (Section 2.3.2), we apply the following criteria. An adjustment is made if: 1. the field significance testing described in Section 2.3.1 delivers a statistically significant (at the 5% level) result and: 2. the mean step size, across all individual statistically significant t-tests, is 0.05 m or more; and/or 3. an event that could be suspected to cause an inhomogeneity is documented in metadata as having occurred within a 3-year moving window of the date of the detected inhomogeneity.
In general, if a specific date of the gauge relocation or other metadata-supported inhomogeneity is provided this is used, otherwise adjustments are made on 1 January on the year the inhomogeneity is detected. Any adjustments are made such that all observations are with respect to the present-day datum.

| Combining both detection and adjustment algorithms
In homogenizing a base series, comparisons using pairwise t-tests are made with multiple reference series which are often also ANCHORS sites. In these cases, this means that a statistically significant test result is counted in both the base and reference series. However, the most likely outcome is that only one of the two sites has an inhomogeneity. To avoid this 'double counting', it is necessary to determine which of the two sites is inhomogeneous and which is not (as in a single pairwise test either the reference or base series could be inhomogeneous). This is achieved by performing a 'first pass' field significance test, considering the results of all pairwise t-tests (i.e. including 'double counting'), then re-conducting the field significance testing after each adjustment excluding the t-test results used in that adjustment from any subsequent iteration of the field significance testing. The criteria are then reassessed for the next most significant detected inhomogeneity.
Adjustments are first made to the most statistically significant detected inhomogeneity (based on the first pass field significance test) where metadata indicate a temporally aligned site move or gauge change (termed 'metadata support'). This is shown as the first row in the blue section of Figure 5. After all the detected inhomogeneities with metadata support have been considered, the remaining detected inhomogeneities are considered as above, in order of decreasing statistical significance (as determined by the first pass). This is shown as the second row in the blue section of Figure 5. This reflects a prioritization of metadata and consensus (as determined by the strength of the field significance test result) in the adjustments made.

| Detection and adjustment of inhomogeneities in annual means
The process of robustly detecting and correcting inhomogeneities means that ANCHORS represents a suitable Australian tide gauge dataset for monitoring long-term sea level changes. However, it is important to note that homogenization was applied to annual means only. Therefore, any potential sources of error that may produce inhomogeneity in variance, or in daily or monthly means (not accounted for in the annual mean), may not have been corrected by the homogenization process. This is important when considering sea level extremes, and especially, non-tidal extremes. For example, timing errors and changes to tidal patterns have not been corrected beyond the quality control procedures used for all Bureau of Meteorology hourly tide gauge records. Whilst current quality control procedure (applied to the unadjusted data before homogenization) focusses on correcting timing errors, it is unclear whether this procedure has been applied consistently since the commencement of the datasets. Therefore, caution must be taken when using ANCHORS for purposes where these potential errors could impact results.
Further improvements to all available Australian sea level data (including but not limited to ANCHORS) could further quantify potential sources of error in the underlying (i.e. unhomogenized) hourly values, for example, slowly shifting data resulting from unstable mounting infrastructure. A future improvement of ANCHORS could be to use percentile-matching methods to test the homogeneity of variance (e.g. Trewin et al., 2020), and to consider methods for correcting errors that affect statistics only on subannual timescales.
Despite these limitations, we have provided hourly data as an alternative to the (unhomogenized) hourly data already available in global repositories and utilized by the international research community. This is because the detection and adjustment of inhomogeneities manifested in annual means represent an important, incremental improvement in these hourly data.

| Quality assessment of the homogenization process
We have not sought to make assessments on the quality of the data, only the quality of the homogenization process. As the data are sourced from multiple third parties and only archived as filtered hourlies without the originally transmitted data, any formal quality assessment is very difficult. Therefore, the key focus for our assessment is on whether the homogenization process has resulted in the original data being degraded by false adjustments or falsely claimed to be homogeneous via missed detections. These are situations where an apparent inhomogeneity is incorrectly identified and subsequently adjusted or where a true inhomogeneity has not been detected or adjusted when it should have been. Whilst false adjustment has been addressed by the minimum threshold of detectability (Section 2.3.2), missed detections have not yet been considered.
We quantify the likelihood of missed detections by assuming it is directly proportional to the number of reference series a particular ANCHORS site has at a specific point in time. The logic is that the more pairwise t-tests that are completed the more likely you are to identify an inhomogeneity in a record. The smallest number of reference series that collectively produced a statistically significant result in field significance testing is three (Port Adelaide in 1950). Therefore, we cannot be confident in the ability to detect inhomogeneities in a base series that has fewer than three reference series and cannot rule out the possibility of inhomogeneities remaining undetected in these situations. We therefore assess data as lower confidence if there are fewer than three reference series with a defined annual mean within a 3-year window of any given year and higher confidence if there are three or more reference series that meet the condition. At Hobart and Onslow, lower confidence was also assigned due to very clear and large data errors that we identified by metadata but affected subannual time periods and hence could not be detected by the homogenization algorithm (refer Appendix for further details). Since homogenization was conducted using annual means with a 5-year moving window, it is also possible that a hypothetical inhomogeneity in the last 5 years is yet to be detectable.
Because limits on data completeness and the number of reference series were imposed, the higher confidence dataset has lower spatial and temporal resolution than the full complement of Australian tide gauges identified in Section 2.1. The effects are particularly acute prior to 1966, which was also identified as the earliest suitable start year for national analysis by White et al. (2014). Many different metadata sources, especially Hamon (1963), show that earlier tide gauge data did once exist and sourcing and digitizing such records (e.g. Marcos et al., 2021) is likely the only way to extend ANCHORS prior to 1966. Such data rescue efforts will also improve the confidence we have in the earlier digitized records that do not presently have enough reference series to be considered as 'higher confidence' records pre-1966. This will also improve the detection of changes in the rate of sea level rise around Australia (Haigh, Wahl, et al., 2014;Watson, 2011Watson, , 2020, without needing to wait for future observations.

| Limitations in metadata availability
Whilst the sum of the metadata sources collated for this project represents the largest such collection published so far in the development of Australian sea level records, the dataset is still limited by what is practically available. Whilst metadata availability varies from location to location, typically available metadata and gauge history is less complete since 1990 than between 1966 and 1990. These data may exist in digital forms (e.g. email archives) but have not yet been traced and some may no longer be available. Whilst data processing and gauge technology may have improved, the increasingly automated nature of recording sea level means that often metadata are not provided routinely with the data as it once was when observations and gauge information were mailed or faxed every year. This highlights the need for accurate and careful sourcing and archiving of metadata, especially, when, in many cases, the gauge operators and data custodians are from different agencies.

| Potential use cases
As the first and only homogenized tide gauge dataset for Australia, ANCHORS is intended by the authors to be the primary reference for monitoring the impacts of coastal sea level changes on infrastructure and communities. The lower confidence (i.e. sites and time periods with fewer reference series) data have fewer applications, likely limited to generating a climatology. By contrast, higher confidence data are of greater use for a wider variety of research, including, but not limited to: -monitoring changes in coastal sea levels (e.g. White et al., 2014) -definition, and analysis of exceedances, of coastal flood thresholds (e.g. Hague et al., 2019Hague et al., , 2020 -extreme coastal still water level analyses (e.g. Pattiaratchi et al. 2018) -climate change attribution studies -investigating modes of climate variability and how these influence coastal sea levels (e.g. Holbrook et al. 2011) -model training and verification 4 | DATASET ACCESS Data are available from Australia's National Computational Infrastructure (NCI), under a Creative Commons Attribution 4.0 International License. Data can be downloaded from the following DOI: http://dx.doi. org/10.25914/ 6142d ff37250b. This location contains a folder for the homogenized files (*_adj.csv), as well as unadjusted files (*_unadj.csv) and the licence.
The files include columns as specified below: -UTC_DT -the UTC timestamp associated with the hourly observation in format: YYYY-MM-DD hh:mm:ss -sea_lvl -the (adjusted or unadjusted) hourly filtered sea level observation, in metres, rounded to the nearest centimetre. -qf -a quality flag with strings 'lower', 'higher' and 'missing', with the first two based on the level of confidence assessment in Section 3.2 and the latter where no observational data is available for that time. This field is not included in the unadjusted data files.
It is envisaged that these data will be updated operationally. However, as all adjustments have been made such that the present interpretation of tide gauge zero is maintained historically, users could add newer data to the existing adjusted data, provided the datum is not changed in future. This also means that pre-existing datum information, such as that provided by PSMSL, can be used to recast all data to different reference levels or data.

| CONCLUSIONS
In this study, we have presented the Australian National Collection of Homogenized Observations of Relative Sea Level (ANCHORS). The dataset comprises 38 tide gauge records from around Australia, varying in length from 34 to 123 years, with an average of 57 years. We have extended existing homogenization techniques to detect and adjust errors in the data caused by non-geophysical changes such as gauge relocations and equipment upgrades. We also discuss potential sources of inhomogeneities and the likely effects these have on the data. This dataset addresses a significant gap in understanding how sea level is changing in the Australian region. Despite some limitations, ANCHORS represents a significant improvement in the availability and utility of Australian coastal sea level data. The continued collation of gauge history information would improve the ability to confidently detect and adjust inhomogeneities, and further digitization of observations could extend the dataset's start date to before 1966. ANCHORS represents an incremental improvement on existing Australian sea level data and is intended for use in monitoring changes in the exceedance of coastal flood thresholds due to increasing mean sea level, around Australia.