Effects of outliers on remote sensing‐assisted forest biomass estimation: A case study from the United States national forest inventory

Large‐scale ecological sampling networks, such as national forest inventories (NFIs), collect in situ data to support biodiversity monitoring, forest management and planning, and greenhouse gas reporting. Data harmonization aims to link auxiliary remotely sensed data to field‐collected data to expand beyond field sampling plots, but outliers that arise in data harmonization—questionable observations because their values differ substantially from the rest—are rarely addressed. In this paper, we review the sources of commonly occurring outliers, including random chance (statistical outliers), definitions and protocols set by sampling networks, and temporal and spatial mismatch between field‐collected and remotely sensed data. We illustrate different types of outliers and the effects they have on estimates of above‐ground biomass population parameters using a case study of 292 NFI plots paired with airborne laser scanning (ALS) and Sentinel‐2 data from Sawyer County, Wisconsin, United States. Depending on the criteria used to identify outliers (sampling year, plot location error, nonresponse, presence of zeros and model residuals), as many as 53 of the 292 Forest Inventory and Analysis plot observations (18%) were identified as potential outliers using a single criterion and 111 plot observations (38%) if all criteria were used. Inclusion or removal of potential outliers led to substantial differences in estimates of mean and standard error of the estimate of biomass per unit area. The simple expansion estimator, which does not rely on ALS or other auxiliary data, was more sensitive to outliers than model‐assisted approaches that incorporated ALS and Sentinel‐2 data. Including Sentinel‐2 predictors showed minimal increases to the precision of our estimates relative to models with ALS predictors alone. Outliers arise from many causes and can be pervasive in data harmonization workflows. Our review and case study serve as a note of caution to researchers and practitioners that the inclusion or removal of potential outliers can have unintended consequences on population parameter estimates. When used to inform large‐scale biomass mapping, carbon markets, greenhouse gas reporting and environmental policy, it is necessary to ensure the proper use of NFI and remotely sensed data in geospatial data harmonization.

In addition to characterizing observed ecological patterns and processes, NFI data have also supported the parameterization of ecological models including spatially explicit landscape models (Purves et al., 2007;Suárez-Muñoz et al., 2021;Wang et al., 2013), tree growth models (Lichstein et al., 2010;Heilman et al., 2022) and species distribution models (Crookston et al., 2010;Iverson et al., 2019;Prasad et al., 2020), which aim to predict forest responses to changing climate and disturbances.
NFIs collect data on forest attributes at regional to continental scales, and the sub-populations (land area) of interest included in NFIs range from forests impacted by specific activities (e.g. management and harvest), forests of specific ownership classes (e.g. only public land), to only forestland or all land (Tomppo et al., 2010). To capture this range, many NFIs use probability-based sampling designs: inventory plots are established across the landscape to collect a representative sample of the population of interest (Bechtold & Patterson, 2005;Tomppo et al., 2010). These strategic inventories (many sites collecting a relatively small quantity of data) provide a contrast to traditional, place-based ecological research networks (few sampling sites collecting a relatively large quantity of data).
NFIs require careful planning to ensure estimation of parameters are consistent and generalizable across the population of interest (i.e. forestlands). Ongoing efforts to coordinate among NFIs internationally (Kovac et al., 2014;McRoberts, Tomppo, et al., 2012;Prasad et al., 2020;  The recent proliferation of remote sensing technologies has facilitated the estimation of forest resources by integrating NFI data with remotely sensed data. This process, hereafter 'data harmonization', combines geospatial data and/or data types from multiple sources (Global Forest Observations Initiative, 2020;Lister et al., 2020;White et al., 2013). Remote sensing provides the opportunity for NFIs to estimate forest resources beyond the small proportion of land inventoried by on-the-ground sampling, filling the gaps between NFI plots (Wilson et al., 2013;Yu et al., 2022). Remote sensing may also improve cost efficiency of NFIs, for example, stratifying by forest/non-forest to reduce sample size needed to meet precision standards, providing low-cost information about forest attributes if NFIs are given a small budget, or reducing travel expenses to remote areas where access may be difficult (Lister et al., 2020).
However, NFI definitions and protocols were often formulated and established decades before the emergence and proliferation of remote sensing (Lister et al., 2020;Tomppo et al., 2010), so geospatial data harmonization using NFI data can lead to outliers-defined broadly as potential mismatches among data from different sources that lead to values that are biologically untenable or statistically different from the rest (Kendall et al., 1982). For example, NFIs might define forest/non-forest based on canopy cover and/or land use, but remote sensing might detect forest-like land cover in areas that are considered non-forest by NFIs (e.g. trees outside forests in orchards, parks and residential areas; Thomas et al., 2021). NFIs must consider whether the definitions they adhere to and the attributes these definitions describe can be characterized through the harmonization of remotely sensed and field-collected data.
Here, we provide an overview of potential outliers that arise from geospatial data harmonization of remotely sensed and fieldcollected NFI data. Our study had two objectives: (1) to provide a 4. Outliers arise from many causes and can be pervasive in data harmonization workflows. Our review and case study serve as a note of caution to researchers and practitioners that the inclusion or removal of potential outliers can have unintended consequences on population parameter estimates. When used to inform large-scale biomass mapping, carbon markets, greenhouse gas reporting and environmental policy, it is necessary to ensure the proper use of NFI and remotely sensed data in geospatial data harmonization.

K E Y W O R D S
airborne laser scanning, data harmonization, Forest Inventory and Analysis (FIA), lidar, Sentinel-2 literature review on common sources of outliers between remotely sensed and NFI data and (2) to illustrate the effects of retaining or removing outliers on estimates of above-ground live and standing dead biomass per unit area (hereafter 'biomass per unit area') using a case study in Sawyer County, Wisconsin, United States (US). We conclude with discussion on how outliers may affect other sampling efforts beyond NFIs and how the uncertainty that arises from geospatial data harmonization can have substantial effects on resource estimation that informs management, carbon markets and greenhouse gas reporting.

| A PRIMER ON REMOTE S EN S ING -A SS IS TED E S TIMATI ON
A general data harmonization workflow for estimating forest parameters (e.g. biomass per unit area) for a population (study area) from field-collected and remotely sensed data includes the following steps. First, forest inventory data are collected at plots within the population of interest to estimate tree-and plot-level biomass.
Individual tree measurements, usually diameter and height, are converted to tree-level biomass predictions using allometric models.
The United States Department of Agriculture Forest Service, Forest Inventory and Analysis (FIA) program uses a component ratio method which first predicts the volume of different components of the tree and converts volume to biomass and carbon predictions using scaling factors based on wood densities and carbon fractions (Domke et al., 2012;Doraisami et al., 2022;Miles & Smith, 2009;Woodall et al., 2011). Other allometric models can be used with NFI data (including Jenkins et al., 2003;Chave et al., 2005;Chave et al., 2014;Colmanetti et al., 2018;Gonzalez-Akre et al., 2022). These predictions can be standardized to per unit area (e.g. megagrams of biomass per hectare) using expansion factors based on NFI sampling design that relate plot area and sample size to the population. Some NFIs make publicly available their inventory data and associated volume/biomass/carbon predictions (such as the FIA DataMart, https://apps.fs.usda.gov/fia/datam art/datam art.html) while others only release estimates of population parameters such as country-or region-level estimates of total forest area (Tomppo et al., 2010).
Next, remotely sensed data are compiled for the same population ( Figure 1). Passive sensors rely on the reflectance of the sun from surfaces, whereas active sensors use their own light or other energy source (Lu et al., 2016). Generally, passive sensors provide information about the reflectance of the forest canopy, whereas active sensors have the potential to penetrate the canopy and provide additional information (Gonzalez et al., 2010;Vaglio Laurin et al., 2014).
Remotely sensed data may be delivered as raster data with a fixed resolution (hereafter 'pixel based') or, in the case of lidar, as point clouds for which users are not constrained by a pre-determined grid and can 'clip' data to features (such as an NFI plot footprint) or calculate metrics and create summaries at user-defined resolution.
Concurrent forest inventory and remote sensing is possible when sample size is small and equipment is available (Stovall & Shugart, 2018). This is common practice for traditional ecological research networks that can commission their own remote sensing.
However, concurrent field sampling and remote sensing for NFIs (generally a large sample size with many field crews) usually become economically or operationally unfeasible (Oh et al., 2022, but see Babcock et al., 2018). Instead, publicly available, large-scale aerial (such as uncrewed aerial systems or airplane-based platforms) or satellite (such as Landsat or Sentinel) remote sensing are typically used.
Temporal resolution (revisit time) and data quality (due to inconsistent lighting condition or cloud cover) often necessitate the use of a time window around the field inventory date to get complete coverage of the population of interest. Once an appropriate sensor is chosen, either measurements from the sensor or metrics are calculated for NFI plots and the rest of the population units for the study area.
After compiling NFI plot data and remotely sensed data, one of several estimators used to estimate forest biophysical population parameters (e.g. biomass per unit area) is selected. Two estimators were the focus of this study. The simple expansion estimator (hereafter 'expansion estimator') relies solely on the probability-based NFI sample to generate population estimates. The model-assisted estimator uses remotely sensed data to reduce the variance (or increase the precision) of the estimate. To do this, the model-assisted estimator requires a model to be constructed relating NFI plot biomass to remotely sensed metrics. Then, this model is used to predict biomass for each population unit. Population estimates are based on the predicted values for the population units but are adjusted based on the NFI plot observations. More details on these estimators can be found in the case study below.

| OUTLIER S AND NONRE S P ON S E
The term 'outlier' describes observations that are questionable because their values are biologically untenable or statistically different from the rest, which brings into question whether they are mistakenly sampled from a different population or that the sampling procedures or model used to predict the attribute have failed to correctly characterize them (Kendall et al., 1982). This definition allows some observations to be characterized as outliers based on random effects-by chance they are outside of the range of the rest-while others may have known causes due to features of the sampling procedure or measurement protocol. Outliers due to random effects are determined by a threshold for a variable, for example, 1.5 times the interquartile range below the first quartile or above the third quartile. In regression, a threshold can be set for either observed values or their residuals (difference between observed and predicted values), for example, observations with the largest absolute values of residuals and are many standard deviations away from the mean (Draper & Smith, 1981). There are many other types of outliers where values may not reach a statistical threshold, but rather the values are abnormal given the inventory procedure or sensor specification. These procedural outliers can arise from a variety of sources: human error; accuracy and precision of equipment (such as Global Positioning System [GPS], receivers that are influenced by connectivity, weather, topography or canopy-related interference); sampling design and protocols (such as definitions of forest/non-forest and minimum size classes for measurement); and data availability (such as return window for satellite-based sensors); among others.
Nonresponse-the inability for NFIs to collect data at a given location or time-can lead to missing data. Nonresponse occurs for a variety of reasons for NFIs, including denied access on private lands or hazardous conditions (Domke et al., 2014;Patterson et al., 2012;. Nonresponse can also arise from plots that cannot be located during remeasurements by field crews, plots located in tidal zones and plots outside of geopolitical boundaries (Burrill et al., 2021). These types of missing data can be missing at random, missing completely at random or missing not at random (Little & Rubin, 2002;Rubin, 1987).
Missing at random and missing completely at random occur when missingness is not linked to predictor or response variables. If the probability of nonresponse is equally likely and is not related to a particular cause in an NFI-that is nonresponse is missing completely at random-there are ways to compensate for missing data through the probability design of the NFI by adjusting expansion factors (the amount of land each plot represents; Domke et al., 2014). If nonresponse is missing at random (missing and observed data have systematic differences but these differences can be fully explained by other characteristics of the NFI), strata can be constructed to account for these differences (Domke et al., 2014). However, if data are missing not at random-that is they have a specific cause or are related to a specific stratum or category within the NFI-the probability sample can be compromised (Domke et al., 2014). For example, if nonresponse arises primarily from forests with characteristics unlike the rest of the population (e.g. ownership and management), the conclusions drawn from this subset of data are no longer applicable to all forestland.

| Procedural outliers due to NFI protocols and definitions
Entities that conduct NFIs identify key populations of interest when designing and conducting field data collection. Setting thresholds and protocols are important for ensuring the population of interest is F I G U R E 1 Passive and active remote sensing of forest inventory plots. (a, c) Passive optical sensors such as Sentinel-2 report metrics in rasterized (pixel) form, and plot values are typically based on either the average of overlaid pixels or the pixel that coincides with the plot centroid. (b, d) Active sensors such as lidar (right) generate data in point clouds that can be clipped to the inventory plot footprint and allows for the detection of features at a finer resolution. Red circles represent an example Forest Inventory and Analysis (FIA) central macroplot with radius of 18 m. Normalized Difference Vegetation Index (NDVI) (a, c) can range from −1 to 1, but the colour gradient was scaled to NDVI >0 for visualization purposes. Background imagery (b, d) was from the 2010 collection of the National Agriculture Imagery Program (NAIP). measured consistently (IPCC, 2019;Tomppo et al., 2010). Thresholds and protocols serve as assumptions in models, provide context to the interpretation of results and guide precision standards for NFIs (Bechtold & Patterson, 2005). Some examples include forest/nonforest definitions; minimum size thresholds for tree measurement; and a nested macro/sub/microplot design to reduce the effort and cost to measure and/or tally smaller trees and seedlings, among others ( Figure 2). Thresholds and protocols are informed by entity objectives including cost considerations and precision standards, among many other factors (Tomppo et al., 2010).
NFIs pay close attention to where and when data are collected. In the FIA program, base-intensity field data are collected for approximately one plot per 2400 hectares of forestland. The FIA program collected data intermittently only on productive forestland for the first ~80 years of the program's existence ('periodic' inventory), but beginning in the late 1990s/early 2000s, the FIA program switched from periodic to annual inventories, for which a portion of the plots (e.g. 1/10 to 1/5) on all forestland within a state are measured each year, and each plot is remeasured on a 5-to 10-year interval (generally, 5-7 years in the eastern US and 10 years in the western US; Bechtold & Patterson, 2005). The annual inventory spreads the workload of periodic inventories over multiple years, while gaining better temporal resolution to detect change by revisiting plots every 5-10 years. Some areas intensify sampling above baseline sampling intensity by increasing the number of plots and/or reducing remeasurement period.
NFI thresholds limit what is measured on field plots, but remote sensors are often agnostic to these thresholds and have specifications of their own. The presence of understorey vegetation not measured in the NFI protocol may be predicted by remote sensing, but presence of other non-vegetation objects such as humanmade objects or wildlife may also be predicted ( Figure 2). NFIs may also set limits on the type of vegetation data collected, for example, whether the NFI includes protocols for down dead wood or herbaceous understorey vegetation (Bechtold & Patterson, 2005). Trees that are near the edge, but just outside of, the NFI plot boundary are not measured by field crews, but remote sensing could detect canopy features of those trees that extend into the plot; vice versa, trees that are just inside the NFI plot boundary will be measured by the field crew even though a substantial portion of the tree may fall outside of the plot (and is therefore clipped out of the lidar point cloud). For pixel-based products, unmeasured vegetation in and around the plot contributes to the value of the pixel.

| Spatial mismatch of NFI plots between field measurement and remote sensing
Co-registration error of NFI and remotely sensed data can occur due to inaccuracy of plot coordinate measurements made in the field, often obtained through satellite navigation (Dorigo et al., 2010;McRoberts et al., 2018;Pascual et al., 2021). Here, we refer to GPS because it is used by the FIA program but note that other NFIs may use different satellite navigation technologies (i.e. Global Navigation Satellite Systems) that have similar location accuracy concerns.
Horizontal root mean square errors (RMSE) of consumer-grade GPS F I G U R E 2 Detection of forest attributes by Forest Inventory and Analysis (FIA) and remote sensing. Red cylinders indicate FIA subplot (solid) and microplot (dashed) footprints that are clipped out of a point cloud. Features in black are measured by FIA and included in the point cloud; features in blue may be included in the point cloud but are not measured by FIA; features in red are measured by FIA but not included in the clipped point cloud; features in grey are not measured by FIA and are clipped out of the lidar point cloud but may contribute to pixel-based metrics of lower resolution satellite imagery such as Normalized Difference Vegetation Index (NDVI) computed from Sentinel-2 red and near-infrared bands.
receivers are typically 5-10 m under forest canopy and is influenced by weather conditions, timing of satellites, canopy cover and topography that interrupt satellite connections (Kaartinen et al., 2015).
Inaccuracy of coordinate measurements may cause a substantial number of actual plot centres to be located near the measured plot boundary or outside the plot area altogether. The spatial mismatch and resultant inaccuracy of biophysical variables can be partially alleviated by using survey-grade differential GPS receivers (dGPS) (Andersen et al., 2022;Pascual et al., 2021). RMSE of dGPS receivers are less than consumer-grade GPS receivers, with RMSE typically <1 m under closed forest canopy (Hasegawa & Yoshimura, 2003). Similarly, differences in spatial resolution of remotely sensed data and plot size can lead to spatial mismatch. When plot centroids are used to extract a single pixel from rasterized data, the actual measured area may contain additional pixels or vegetation outside of the measured area that may contribute to the pixel value, especially when plot size differs from pixel size (Figure 2; Hyyppa & Hyyppa, 2001;Muukkonen & Heiskanen, 2007). Plots may also be assigned to an incorrect pixel even when the pixel size is equal to plot size if spatial errors are present (McRoberts, 2010). In addition, mismatches between plot (often circular) and pixel (square) shape may occur (Figure 1c), although the effects of these mismatches are minimal (Packalen et al., 2022).

| Temporal mismatch
Remotely sensed data are rarely temporally coincident with fieldcollected data (Oh et al., 2022). Instead, most remote sensing occurs sporadically for aerial-based systems (such as county-or stateacquired lidar) or on daily to weekly time-scales for satellite-based systems (such as a 16-day return period for Landsat 8). In addition, data from satellite-based sensors may need to be compiled across a wider period to capture days without cloud cover. Because of temporal mismatches, forests can change between field measurement and remote sensing. Changes vary in magnitude and cause (biotic, abiotic or anthropogenic), from small, natural changes associated with stand development and aging, to moderate disturbances due to pests, wildlife or silviculture, to large disturbances that remove the forest completely such as stand-replacing disturbance (e.g. wildfire, flooding and windthrow), clear-cut harvesting and land conversion (Fitts et al., 2022). Depending on the sequence of events (remote sensing followed by disturbance followed by field sampling, or vice versa), forest parameters may be under-or over-estimated when harmonizing remotely sensed and NFI data. The selection of an appropriate time window or revisit time should consider the magnitude of change in the population parameter of interest. For example, biomass is accumulated slowly so a multiple-year window may be appropriate, but remote sensing to map wildfires may need a restricted window and/or use a sensor with a higher frequency revisit time.

| Statistical outliers
Some outliers create obvious data anomalies, where algorithms based on remotely sensed data predict the presence of trees and/ or other vegetation even though field crews did not collect data on these plots (e.g. non-forest plots with trees in the FIA program).
However, others are more subtle and may only be identified when considering a statistical model relating remotely sensed metrics to field-collected data. In such cases, plots have abnormally large model residuals which contribute to loss of accuracy and precision for resource estimates (Draper & Smith, 1981). included to obtain full coverage of the study area. Each plot consisted of a central 168.1 m 2 (7.31 m radius) circular subplot with three surrounding subplots with centroids at 0-, 120-and 240-degree azimuths (Bechtold & Patterson, 2005). Because observations on subplots are spatially autocorrelated with each other, we focused solely on the central subplot.
We compiled biomass estimates for our study from the FIA database (Burrill et al., 2021). The FIA program uses a component ratio method to predict biomass for individual trees using diameter at breast height, tree height and other stand characteristics (Woodall et al., 2011). Predictions were calculated at the individual tree level for live and standing dead trees and expanded to per-unit-area (me-

| Identifying outliers
We used multiple criteria to identify potential outliers and constructed subsets of the data with and without these potential outliers (Table 1) We then evaluated which observed residuals fell outside of the 95% confidence interval: CI = 0 ± 1.96 √ Var pred .

| Estimating biomass per unit area using simple expansion and model-assisted estimators
We used two approaches for estimating county-wide mean aboveground biomass per unit area, the simple expansion estimator and the model-assisted estimator (Cochran, 1991;McRoberts, Gobakken, et al., 2012a;McRoberts et al., 2013;Särndal et al., 2003). The expansion estimator relies on the assumption of an equal probability NFI sample and does not require auxiliary data (except for the identification of outliers). Using the expansion estimator, mean biomass per unit area (̂ EXP ) can be expressed as: and variance of the estimate of the mean (Vâr ̂ EXP ) can be estimated as: where y i is the total biomass at each of the n FIA plots (McRoberts, Gobakken, et al., 2012). The expansion estimator serves as a baseline to compare the effectiveness of incorporating remotely sensed data in the model-assisted estimator.
For the model-assisted estimator, we compared two models that related biomass to remotely sensed data. First, we used a nonlinear regression model of the form: where 1 and 2 are parameters to be estimated and x i is the remotely sensed metric of interest, and i is the residual error. This model was fit where N is the population size (pixels in the county), and ŷ i is the predicted value of biomass from Equations 3, 4, or 5. The variance of the biomass estimate can be expressed as: To show how the inclusion or removal of outliers affects variance estimates relative to a null baseline, we calculated a variance ratio (VR): where Vâr ̂ NULL is the variance of the expansion or model-assisted estimator of the full sample and Vâr ̂ EST is the variance obtained on an outlier-removed subset using Equations 2 and 7 for expansion and model-assisted estimators, respectively. The VR was also used to compare model-assisted estimators to the null expansion estimator using all data (i.e. no outliers removed); this special case is relative efficiency and can be interpreted as the factor by which the sample size would have to be increased for the expansion estimator to produce the same variance as the model-assisted estimator.
We used two data sources for predictors in model-assisted estimators, ALS and Sentinel-2. We calculated a correlation coefficient between biomass and a suite of ALS and Sentinel-2 metrics and chose the predictors with the largest correlations to include in our models for the model-assisted estimators ( Figure S2). Our models included a nonlinear model with mean vegetation return height (Z mean ) from ALS and a linear model with autumn tasselled cap band 2 (greenness, TC2) from Sentinel-2. We also included the linear model with the lowest corrected Akaike information criterion that combined ALS with Sentinel-2, which used ALS Z mean and Sentinel-2 spring NDVI.

| Case study results and discussion
In total, we flagged 111 of 292 FIA plots (38.0%) as potential outliers. Most of these outliers (99, 33.9% of the total FIA plots) had zero biomass recorded by FIA, and 50 plots (17.1%) also had zero positive vegetation returns (Table 1, Figure 3). However, the remaining 61 plots (20.8%) had potential issues. For example, plots with no biomass but positive ALS returns were abundant (49 plots, 16.8%). Of these, 31 plots (10.6%) were measured before, and 9 plots (3.1%) were measured after 2017 (the year ALS data were collected). In addition, we found 16 plots (5.5%) with potential GPS error. Of these plots, 4 plots (1.4%) had no recorded biomass, but 12 plots (4.1%) had some reported biomass. Nonresponse from denied access accounted for 14 plots (4.8%). Finally, our analysis of model residuals indicated that 12 plots (4.1%) were statistical outliers. Although the proportions of these different types of outliers might be context dependent, extrapolating to the full NFI of approximately 130,000 plots, the removal of the most abundant outlier class (zero biomass but positive ALS returns) could lead to upwards of 22,000 plots removed from national-scale biomass estimation.
We produced estimates of county-wide biomass per unit area when retaining and removing outliers from the analysis using expansion and model-assisted estimators. We compared mean and variance of biomass estimates, and whether model-assisted versus expansion estimators were more sensitive to the inclusion of outliers (Figure 4, Tables 2 and 3). We found that mean biomass estimates were smaller using the nonlinear model-assisted estimator (Equation 3) versus the expansion estimator (all points below the 1:1 line in Figure 4) but were similar when comparing the nonlinear model-assisted estimator using Z mean versus the linear model-assisted (with no intercept) estimator using Z mean + NDVI as predictors (Equation 5, Figure S3).
The model using only the Sentinel-2 TC2 as a predictor was less accurate than models that included ALS Z mean and was removed from the rest of the analysis. Relative efficiency was >1.88 (approximately two-to threefold reduction in variance, Table 3) when using the linear and nonlinear model-assisted versus expansion estimator, regardless of the removal of any class of outlier, indicating a successful harmonization of ALS, Sentinel-2 and FIA plot data. VR comparing the model-assisted estimators on the full sample versus outlierremoved subsets showed that removing statistically significant   Table 3).

Removal of outliers had larger effects on the expansion estima-
tor than on the model-assisted estimator. Our model-assisted estimator used a nonlinear model that is forced through the intercept which led to plots with biomass = 0 having zero residuals. Removal of any type of zero biomass outliers had larger effects on estimates of mean biomass per unit area using the expansion estimator, where removing zero-biomass outliers decreased denominator in Equation 1 (Figure 4). Because GPS errors happen sporadically (i.e. approximately equal number of positive and negative residuals, Figure 3), we found that the removal of data for plots with potential erroneous GPS coordinates had the smallest differences for estimates of mean biomass per unit area ( Table 2). In total, when all outliers were removed, estimates of mean and variance of biomass per unit area increased by 48% and 44%, respectively, in the expansion estimator, versus 15% increase in estimated mean and 28% reduction in estimated variance of biomass in the nonlinear modelassisted estimator.

| WHAT TO DO WITH OUTLIER S?
There are many proposed methods for dealing with outliers.
Often, outliers that arise from protocol are considered in the QA/ QC procedure of data analysis and are removed from the sample . These data are often removed from the analysis because issues can be easily traced to their source within the NFI protocol/definitions or data quality issues. However, the choice to retain or remove these data can have consequences for the estimation of forest population parameters (e.g. mean biomass per unit area), especially when using an estimator that relies solely on the field-collected data (i.e. the expansion estimator). In our case study, 4.1%-16.8% of FIA plots may be outliers if defined by a single criterion or 38% if all criteria are used. The removal from the analysis greatly reduces sample size and brings into question whether the sample can still be characterized as a probability sample of the NFI. The contribution of each sample unit to population estimates using design-based estimators decreases as sample size increases (Equations 1 and 6), so the removal of outliers in small area estimation (when sample size is already limited) can lead to unstable population estimates (Breidenbach & Astrup, 2012;Goerndt et al., 2011).
Another option is to use methods that are more robust to outliers. Using zero-inflated models that account for the presence of structural zeros (i.e. those that arise from protocol definitions) in addition to random zeros (i.e. 'true' zeros that arise from variability in the attribute of interest) can help reduce the effects of zeros on the relationship between predictor and response, but methods to do so are rarely applied to forest biomass estimation (Ancelet et al., 2010). Rubin (1987) suggests that instead of assuming these values are zeros or removing them from the dataset, multiple imputation (i.e. imputing the value multiple times based on a distribution rather than a single time based on the mean) can help to retain the sample size while accounting for uncertainty in the 'true' value of the missing data. Models such as k-nearest neighbours, random forests and other machine learning algorithms are potentially robust to outliers, but interpretability of relationships and sensitivity to other characteristics of the data (such as sample size) may hinder their applicability Yang et al., 2022).
There is the option of doing nothing with the outliers, and the effects may vary based on the estimator. For the expansion estima-  (McRoberts et al., 2018), although in our case study, GPS errors did not strongly influence the accuracy or precision of mean biomass estimates ( Figure 3). Other protocol definitions could be adjusted to better ensure that remote sensing-based models are predicting the same attributes that are being measured in the field. For example, sampling of smaller trees and other understorey vegetation could be expanded to the full subplot area (Figure 2). In our case study, we filtered out ALS vegetation returns <2 m to mimic the lack of understorey sampling in the FIA protocol (Bechtold & Patterson, 2005).
Methods for accounting for nonresponse, such as imputing estimates based on previous measurements and/or other NFI plots in the population and improving the process to request access, could reduce the likelihood of nonresponse-related outliers and their impacts on estimation (Domke et al., 2014;Westfall & Wilson, 2022). Mismatches in observed and predicted biomass resulting from temporal mismatches between field sampling and remote sensing may be due to harvest or other disturbance, so incorporating more information about drivers of change may help better diagnose temporal mismatches (Edgar & Westfall, 2022;Fitts et al., 2022).

| E X TEND ING B E YOND THE NFI
Our review focused on NFIs, but discussion on outliers from geospatial data harmonization could be applied to other survey data.
The quantity of ecological research networks has increased over recent years and serve a variety of purposes, including measuring gas fluxes using eddy covariance ( and/or compiling other sources of remotely sensed data in a similar manner to the workflow described here (Davies et al., 2021;Friend et al., 2007;Kamoske et al., 2022;Musinsky et al., 2022). For example, NEON data have been used in data harmonization workflows for studying patterns of biodiversity (Kamoske et al., 2022), but the relationship between remotely sensed and on-the-ground measurements may be less reliable than expected (Pau et al., 2022).
ForestGEO sites collect field inventory data over the course of many years, but remote sensing (commissioned at ForestGEO sites or compilation of other remote sensing) may not be temporally coincident (Davies et al., 2021). As these ecological research networks continue to grow and expand their sampling and analyses, they should continue to address questions about the validity of their data harmonization workflows.

Methods for harmonizing Global Ecosystems Dynamics
Investigation (GEDI) satellite-based lidar with other sensors and field inventory data are of recent interest for global biomass mapping Silva et al., 2021). The GEDI L4B gridded biomass product was built using a suite of biomass models , and Dubayah et al. (2022) showed high correlation between GEDI biomass and an aggregated FIA biomass product (64,000 ha hexagons containing an average of 20 FIA plots; Menlove & Healey, 2020). Due to the location errors of FIA plots and GEDI footprints (~10 m each on average), it is uncertain whether spatial co-registration of these two data sources can be achieved at the plot/footprint level (Roy et al., 2021). Additionally, GEDI is near the end of its life span, and it is uncertain if GEDI or a similar programme will exist in the future. Practitioners incorporating GEDI or any other remotely sensed data in geospatial data harmonization workflows should consider whether the longevity of a particular remote sensing mission will preclude its use in future applications.
NFI data are being used to assess carbon baselines and additionality-the relative increases in carbon stored on the landscape due to forest management practices such as delayed harvest, reforestation and prevention of land conversion-as potential natural climate solutions (Badgley et al., 2022;Giebink, Domke, et al., 2022).
These baselines rely on the assumption that field-collected data and remote sensing are properly linking remotely sensed metrics to carbon stocks or biomass. But, as shown in our case study, the precision and accuracy of these baseline estimates may be sensitive to potential outliers. Furthermore, if local calibration and validation of model estimates and/or data products are conducted and sample plots are removed to compensate for outliers identified as part of this process, that may jeopardize the underlying probability sample used to calculate design-based estimates of the population parameter of interest (Anderson-Teixeira & Belair, 2022;Badgley et al., 2022).
In summary, our review and case study serve as a note of caution to researchers and practitioners aiming to harmonize remotely sensed data with field-collected data to ensure that what is being measured in the field is being predicted by models based on their remote sensing technologies of choice. The inclusion or removal of outliers may have large effects on the estimation of forest resources, and therefore careful attention should be paid to the workflow used and whether the treatment of outliers will impact the conclusions drawn from geospatial data harmonization. Giebink created conceptual figures (Figures 1 and 2); and all authors contributed to writing and editing.

ACK N O WLE D G E M ENTS
Funding for this project was through the USDA Forest Service, Northern Research Station, Forest Inventory and Analysis Program.
The findings and conclusions in this publication are solely of the authors and should not be construed to represent any official USDA or US Government determination or policy.

CO N FLI C T O F I NTE R E S T S TATE M E NT
The authors have no conflicts of interest to disclose.

PE E R R E V I E W
The peer review history for this article is available at https://publons.com/publon/10.1111/2041-210X.14084.

DATA AVA I L A B I L I T Y S TAT E M E N T
NFI data were obtained from the USDA Forest Service, FIA Program.
Publicly available FIA data are available through the FIA DataMart

S U PP O RTI N G I N FO R M ATI O N
Additional supporting information can be found online in the Supporting Information section at the end of this article.