Dr Karin Reinke is a part-time Research Fellow in the School of Mathematical and Geospatial Sciences at RMIT University (GPO Box 2476V, Melbourne, Vic. 3001, Australia. Tel. +61 39925 9726. Fax +61 39663 2517. Email: firstname.lastname@example.org). Dr Simon Jones is an Associate Professor in the School of Mathematical and Geospatial Sciences at RMIT University (GPO Box 2476V, Melbourne, Vic. 3001, Australia. Email: email@example.com). This paper arises from an Australian Research Council funded project ‘Remote Sensing and Spatial Analysis of Native Vegetation Condition’ (LP0455316). It was also supported in part by the NSW Government's Environmental Trust.
Summary This paper explores data compatibility issues arising from the assessment of remnant native vegetation condition using satellite remote sensing and field-based data. Space-borne passive remote sensing is increasingly used as a way of providing a total sample and synoptic overview of the spectral and spatial characteristics of native vegetation canopies at a regional scale. However, integrating field-collected data often not designed for integration with remotely sensed data can lead to data compatibility issues. Subsequent problems associated with the integration of unsuited datasets can contribute to data uncertainty and result in inconclusive findings. It is these types of problems (and potential solutions) that form the basis of this paper. In other words, how can field surveys be designed to support and improve compatibility with remotely sensed total surveys? Key criteria were identified for consideration when designing field-based surveys of native vegetation condition (and other similar applications) with the intent to incorporate remotely sensed data. The criteria include recommendations for the siting of plots, the need for reference location plots, the number of sample sites and plot size and distribution, within a study area. The difficulties associated with successfully integrating these data are illustrated using real examples taken from a study of the vegetation in the Little River Catchment, New South Wales, Australia.
Field-based mapping of native vegetation condition has received much attention over recent years. However, most field-based assessments are unable to effectively map native vegetation condition in a regional or statewide context, and are impractical for quick reporting or monitoring condition change at the broader landscape, bioregional and statewide level. Remote sensing is able to assist in broad-scale mapping and monitoring of native vegetation because it can offer rapid assessments across a range of spatial scales (Coops et al. 1998; Lillesand et al. 2004).
Given that the management of native vegetation must operate across a wide range of spatial scales, from paddock level to statewide level, the ability of the sensors to effectively map and monitor condition change at each of these scales must be considered. Today, satellite remote sensing can record information at high spatial resolutions (i.e. < 1 m pixel size) and high spectral resolutions (i.e. > 30 spectral bands spread across a wide portion of the electromagnetic spectrum). However, it is often the case that there are trade-offs between the different resolutions and the selection of a satellite sensor can be a compromise between the ideal revisit frequency, spatial resolution, spectral resolution, spatial coverage and purchase cost (Campbell 2002). For definitions of key remote sensing terms, see Box 1.
Table Box 1.. Definitions of key remote sensing terms
A remote sensing sensor that can detect discrete electromagnetic wavelengths.
The point on the Earth's surface that is centred directly beneath the satellite sensor.
A remote sensing instrument sensitive in only one discrete band of the electromagnetic spectrum.
A complete image is made up of an array of pixels or grid cells. A pixel represents a (usually) square spatial unit on the ground in which spectral measurements are recorded for that area.
Spatial resolution is a measure of the spatial detail able to be detected in an image or the area on the ground represented by each pixel.
Spectral resolution is a measure of the sensitivity of a system to record information across the electromagnetic spectrum. It consists of the number of bands, the distribution of the bands along the electromagnetic spectrum and the individual bandwidths the sensor records.
The ability to record data simultaneously over a wide area for the purpose of obtaining a comprehensive and nearly instantaneous picture of the state of a given area on the Earth's surface. A total survey.
Synoptic sensors vary from 25 m to 1.1 km spatial resolution (in multispectral mode) and are capable of producing spatial data coverages at a scales varying between 1 : 500 000 and 1 : 1 000 000. High spatial resolution satellite imagery provides an opportunity for finer scale (e.g. paddock level) mapping. However, because of the small swath width (i.e. image size), high spatial resolution imagery provides detail but not broad-scale context. A multiplatform, multisensor solution can be employed to achieve outcomes across multiple scales.
Synoptic satellite earth observation is a compromise between spatial resolution, spectral resolution, spatial coverage and cost. There is a trade-off in terms of the resolutions to provide a generic, robust dataset. Landsat is the most widely used earth observing satellite and with good reason. In terms of the length of archive, coverage, cost and accessibility of data, it is second to none. For this reason synoptic data must form some part in a native vegetation sampling strategy. Furthermore, because Landsat data are available from the late 1980s onwards it would be possible to investigate change in native vegetation condition from as far back as 20 years ago. Questions remain, however, about the continuity of the Landsat mission given the partial failure of Landsat7. Presently Landsat 5 data are being used to fill the synoptic role but a longer-term solution is needed.
In comparison, the potential of high spatial resolution sensors to record valuable information on the extent and condition of native vegetation is even more promising. High spatial resolution remote sensing systems are better able to link to ground samples and plots of native vegetation condition (e.g. Coops et al. 1997, 1998). Such technology allows individual trees and other features to be readily identified and investigated. However, due to cost, archiving and coverage restrictions, only partial samples of the landscape can be acquired. However, high-resolution imagery is only sporadically archived for Australia so retrospective studies of native vegetation condition changes are, and will continue to be, difficult.
The aim of this paper is to describe some of the difficulties associated with analysing field-based point or plot data with landscape scale data obtained via remote sensing. In particular, issues of spatial compatibility and data quality are considered. Recommendations are presented to improve the utility of site or plot data for studies where additional datasets may be used to complement field-based surveys. Results from a case study are used to illustrate several of these issues.
The purpose of the case study presented here is to provide examples that illustrate some of the problems encountered when integrating field-collected data with remotely sensed information. The original aim of this study was to investigate the utility of passive remote sensing to measure estimates, and surrogate variables, of native vegetation condition. The case study is described in summary to provide context for the examples supplied with the key recommendations made later in this paper.
The study area consisted of the Little River catchment located in central western New South Wales and covered approximately 260 000 ha. Three data sources were used in the study: field data, modelled vegetation condition data and remotely sensed data. Two hundred and forty-four field plots (50 m × 20 m) were collected using stratified, representative and random sampling during spring 2001 and autumn 2002 and were located using hand-held Global Positioning System (GPS) units (Seddon et al. 2002). Different field variables were recorded at each plot and used to develop discrete vegetation condition classes.
Remotely sensed data was obtained from SPOT4 High Resolution Visible Infra Red (HRV-IR), Landsat7 Enhanced Thematic Mapper Plus (ETM+) and IKONOS satellite sensors. These are commercial space-borne platforms that were chosen given their widespread use, availability, spatial coverage and their spectral and spatial resolution. They are also well-established platforms with considerable archives (to look retrospectively) and likely future continuation. The remotely sensed data was processed, integrated and compared with field-based assessments of native vegetation to determine relationships between spectral responses and vegetation condition parameters. Standard image processing techniques were also applied to the remote sensing data and explored for relationships with field-based variables as well as with modelled variables of native vegetation condition.
Considerations for data collection of field-based assessments
Many data compatibility issues surfaced during the case study. The most critical of these were issues of temporal mismatch between some datasets and the uncertainty associated with the relative positional accuracy between datasets. In addition, many of the field-based variables of native vegetation condition were not distinguishable using passive remote sensing alone or simply were not sufficiently represented within the dataset. Issues of spatial misalignment between datasets, and the variation within and between land cover types hindered analysis at several spatial and/or spectral resolutions.
Ensuring a legacy from field information
Data quality information and metadata (i.e. data about data) should be provided with all datasets. This is commonly accepted practice for the spatial data community with national bodies such as Standards Australia and the Australian and New Zealand Land Information Council (ANZLIC) establishing industry standards (ANZLIC 2001). The standards are used to describe the characteristics of the spatial dataset in sufficient detail to allow the user to determine whether data are of a suitable standard for an intended application. The elements defined in the standard for spatial data quality are attribute accuracy, positional accuracy, temporal accuracy (i.e. currency), logical consistency, completeness, and lineage and are described in Table 1.
Table 1. The five spatial data quality elements as defined by ANZLIC (2001) used to describe the content of spatial data enabling the user to assess fitness-for-use
Data quality element
Description and examples
Positional-accuracy refers to the precision of the data geo-positioning and/or its positional accuracy. This may be a relative or absolute measure of the map distance and direction to an earth location or secondary data source such as a satellite image. Example: Is the defined vegetation boundary in the same location as identified on other datasets with which it is being analysed?
Temporal Accuracy or Currency
Temporal accuracy concerns the currency of the data to the application. Have the data been collected at the appropriate time or are they considered out of date? Another definition requires the date at which a feature's position or attribute was validated as being correct must be the same as the date of recording the feature's attribute. Example: Is there a difference between dates from when the field-based assessment was made and when the remote sensing imagery was acquired thereby preventing direct comparisons to be made between datasets?
Attribute accuracy concerns how well the thematic definitions and measurements describe the phenomenon being observed. It also relates to how correctly the locations on the ground have been assigned attributes. Example: Are the vegetation communities recorded for field assessments in agreement with a dataset considered of a higher attribute and positional accuracy.
Assumes consistent measurements will yield similar values when repeated many times. It questions the reliability of the method of data collection and internal fidelity of the data. Reporting the level of logical consistency in a dataset involves testing not only the topological accuracy of the data, but also the logic of relationships between different data points. In other words, logical consistency checking is about ensuring that relationships that would not logically be observed in the real world do not reside in the database. Example: An area identified as disturbed by heavy logging has also been identified as an area of low human impact.
Indicates missing values or omission. In this respect, it is also an implicit statement of data resolution. It can also apply to the inclusion of data that does not exist in reality referred to as commission. Completeness also refers to the assumption that mapping rules are applied in equal fashion to all data. Example: Are there sufficient field plots for Fuzzy Box communities for statistical comparison with remote sensing information? Are there sufficient field plots with a high level of dieback to develop spectral signatures?
Refers to the origin of the data, the source materials, sampling design, data collection, pre-processing and transformation methods applied to the data. Consider for what purpose the dataset was produced and the appropriateness of ‘recycling’ the data. Example: What sampling design was used to locate field sites? What spatial data was used to locate the site? Was it a GIS layer, hardcopy map?
Comparing spatial datasets is a complex task. Any one or more of these six elements may lead to erroneous conclusions, and it is at this point that the data user must decide whether the differences are acceptable or not. Consider an example, a survey of native vegetation condition undertaken in the mid-1970s and repeated today. Issues of logical consistency would arise as each individual or team will have their own methodologies, idiosyncrasies even. Can the true location of the 1970s sites be identified? Over this 30-year gap much has changed in the technology available to assist in accurate data collection yet without appropriate metadata future users will be unable to assess the fitness for use of the dataset. An understanding about why, how and what was captured in the field will always remain necessary, becoming increasingly important as more and more data are gathered and integrated.
These standards have been implemented in the Australian National Vegetation Information System (NVIS) (ESCAVI 2003) and should be similarly adopted for any field data with potential to be used in spatial analysis. In the USA, the Federal Geographic Data Committee (FGDC) has national documents describing the type of metadata relevant to field-based assessments of vegetation (Federal Geographic Data Committee and National Spatial Data Infrastructure 1997). The US Vegetation Mapping Program (The Nature Conservancy 1994) provides extensive guidelines for vegetation classification standards and field collection methods that should be used when acquiring vegetation data in the field. Gillison (2002) reports a software package – VegClass – that facilitates the data entry, summary and analysis of metadata in a way that is user-friendly and rapid and ensures metadata consistency and completeness.
For future data collection exercises, where it is anticipated remotely sensed products would be used to support vegetation condition monitoring and analysis, several key recommendations can be made. It is difficult to prescribe absolute guidelines as the sampling design will always be a trade-off between:
1Design requirements for obtaining a representative sample of the phenomena of interest.
2Suitability of sample data to support intended statistical tests.
3Practical constraints of the project (e.g. site accessibility, cost, etc.).
4Considerations for the inclusion of remote sensing data.
The intention of this discussion is not to replace existing field sampling design theories and practices but rather to improve their ability to accommodate the needs of remote sensing applications. These recommendations are aimed to assist in image interpretation, analysis and integration of field-based data only. They are made in addition to the metadata requirements and aim to help facilitate integration with remotely sensed data.
Recommendations for plot locations
Plots should be located within a homogenous or consistently mixed land cover unit and be located within a given distance of vegetation (or other land) boundary to (i) reduce the effects of relative positioning errors between datasets and (ii) minimize the occurrence of spectral mixing.
The given distance from the boundary (such as ‘edge of patch’) will be largely dependent upon the size of remnant vegetation patches and the spatial resolution of the sensor being used. Some vegetation mapping standards (e.g. NBS/NPS Vegetation Mapping Program) specify a distance of at least 30 m from boundaries. However, it is recommended that a minimum distance of at least one pixel width (e.g. Landsat7 ETM+ has a pixel size of 30 m2 in multispectral mode) from the boundary to be used. While this would overlook potential edge effects in native vegetation patches, it is the starting point from which to understand the relationships between field-based variables and remotely sensed information before more complex heterogeneous plots are assessed.
When utilizing data from multiple sources, spatial misalignment between datasets is a common problem. While absolute positional accuracy of field-based data is improved with the use of Global Positioning Systems (GPS), it is the relative positional accuracy with the remotely sensed data that is of concern. The potential consequences of a difference in positioning between the two datasets can invalidate assessment and analysis results. In the case study, an offset by a few metres was sufficient to potentially place a field plot in an entirely different environmental setting. For example, Figure 1 shows the degree of homogeneity surrounding plot sites and the potential for some plots to shift from being located completely in remnant vegetation to one including cleared agricultural land and other land cover types.
Recommendations for location control points
Valid comparisons between datasets cannot be made unless the same spatial relationship exists between each of the datasets. The use of location control points helps minimize and understand positional misalignments between datasets. To improve the relative positional accuracy between datasets it is recommended that location reference points are captured as part of any field survey. The reference points should be easily identifiable features that are unlikely to shift over time (e.g. road intersections) which are distributed across the area of interest in the image. A minimum of four points is needed to fulfil a basic transformation. Commonly, a post-transformation root mean square error (RMSE) of less than half the pixel size of the image to be georegistered is considered acceptable. Most GIS and image processing software provide georegistration functionality and accompanying documentation that sufficiently describes georegistration requirements and processes.
An initial investigation into the positional accuracy of the SPOT4 HRV-IR image found some features were up to 200 m in error compared to other reference datasets. In other words, it was uncertain that a location within the SPOT4 HRV-IR image correctly corresponded to the same location on an alternative dataset making comparisons using this dataset invalid. In addition, it was difficult to identify suitable control points visible on the SPOT4 HRV-IR image due to the study area being mostly native vegetation and agricultural land. In this study, the centre point of water bodies (e.g. farm dams) was used.
Recommendations for plot size
Each pixel on a remotely sensed image represents an integration of information within its spatial boundary. A plot size should be sufficient to adequately sample the feature(s) of interest at a scale consistent with the spatial resolution of the remote sensing sensor that will be used, or is appropriate, but should also be at a scale that is appropriate given the spatial distribution or size of the attributes to be measured. For example, measuring mature tree density in a 1-m plot (i.e. the pixel resolution of IKONOS Panchromatic) is not appropriate and similarly, measuring litter volume in a 30-m plot (the pixel resolution of Landsat7 ETM+) is not feasible. Further, some field-based features are not measured in plots (rather via points, transects, etc.). Plot size requires two issues to be considered; the minimum sampling unit suitable for the imagery and the minimum sampling unit suitable for the attribute. It is possible, given the issues of scale dependency between vegetation patterns and processes, and remote sensing capabilities and limitations, that a field sampling design would benefit from a multiscale survey.
From a remote sensing perspective, the minimum size for a field plot should be the equivalent of the size of at least one pixel for the given sensor. For example, a minimum plot size of 20 m × 20 m would be used for SPOT4 (HRV-IR) imagery. This is considered the minimum plot size (MPS) as features less than this size will be unable to be located within the pixel. MPS may also be calculated using the ground diameter of a pixel and the geometric accuracy of the pixel (Justice & Townshend 1981). The MPS can be expressed as:
For example, Landsat7 ETM+ multispectral has a pixel size of approximately 30 m × 30 m. If the geometric accuracy for the image of interest is two pixels, this would result in a plot size of 150 m × 150 m. A geometric accuracy of 0.5 pixels would result in a plot size of 60 m × 60 m. IKONOS panchromatic has a pixel size of approximately 1 m × 1 m. A geometric accuracy of two pixels would result in a plot size of 25 m × 25 m. A geometric accuracy of 0.5 pixels would result in a plot size of 4 m × 4 m.
The spatial resolution of SPOT4 HRV-IR and Landsat7 ETM+ was often too coarse to even identify a remnant native vegetation patch, let alone detect a particular attribute. This has implications for extracting vegetation condition where the resolution of the image is too low compared with the remnant size. In the study, approximately 19% of the study area is covered by woody vegetation. Of this, only 10% of the woodland remnants in the study area are greater than 1 ha with approximately 10 000 remnant patches being less than 1 ha (Seddon et al. 2002).
Recommendations for sample size and distribution
The number and distribution of field plots should be adequate to represent the diversity and spatial variability of each vegetation community and condition states, and to support intended statistical analysis and accuracy assessments. It is common practice to collect a 1–5% sample (Lillesand et al. 2004). It is acknowledged many good studies do not achieve this sampling density and the recommendations serve as a general guide only. Sample size will foremostly depend upon the amount of variance in the population and the level of precision that is acceptable to answer the research question.
For the purpose of remote sensing classification, Curran and Williamson (1985) indicated that the use of too few sample sites is a major source of error and that a sample size of at least 30 should be collected to reduce the amount of sampling errors. Brogaard and Ólafsdóttir (1997) recommend a minimum sample size of 50 for each category of interest. In such types of remote sensing applications, the sample size is dependent upon the number of pixels in the training data and the number of classes. Readers are referred to Brogaard and Ólafsdóttir (1997) for further details on the sample size requirements necessary to support accuracy assessment of classifications.
The high purchase cost of IKONOS imagery inhibited its use to a single image (∼140 km2) within the study area. The lack of field plots within the acquired image limited the analysis to descriptive statistics and subjective interpretations.
Recommendations for timing of collection
The date of acquisition for the scene should ideally coincide with the date the field observations were recorded to assist in image calibration and analysis. However, there may be times, particularly for vegetation assessment applications, when the closest temporal match may not be available or the most appropriate. For example, it may be more suitable to select an image scene with a poorer temporal match but which falls within the same season as field capture dates compared with an image scene that falls within a different season but is otherwise closer to the date of field capture. It is also important to take into consideration extreme climatic (e.g. drought) or disturbance (e.g. fire) events that may unusually modify vegetation characteristics.
Where ongoing monitoring and change analysis of native vegetation condition is to occur, multiple field data collections and image acquisitions are required. Field visits should be timed to coincide with anniversary dates of previous field visits and should ideally be collected synchronous to the satellite overpass. However, due to practical reasons it is common for field observations to be collected during the period between 10 : 00 and 14 : 00 h. Determining a revisit strategy depends not only upon the natural seasonal variations and succession of the vegetation but also the revisit frequencies (both nadir and off-nadir) of the satellite sensors being employed.
Field data were captured during spring 2001 and autumn 2002, and image scenes were acquired retrospectively. The revisit frequency of Landsat7 ETM+, SPOT4 HRV-IR and IKONOS (i.e. less than 30 days) did not present any limitations for this study (however, an existing Landsat7 ETM+ image already held was used in the study despite the image being acquired two years prior to data collection). The scenes were selected according to those that met the prerequisite criteria of spatial coverage of the study area and of being free from problems (e.g. cloud cover). From this set, the final image scenes were selected based upon those with the closest temporal match to the field-based observations.
Recommendations for attributes collected
It is recommended to record and store the raw measures of any quantitative attribute where practical. For example, canopy density is best recorded as an actual percentage rather than simply being assigned a class. Raw measures can be later modified into more generalized information according to the user's requirements. This eliminates the classification scheme being imposed upon the user and enhances the usability of the data.
Traditionally, attributes of vegetation measured by passive remote sensing include generalizations (the degree to which is dependent upon the spatial and spectral resolution of the sensor) of leaf area, changes in photosynthetic pigment and fragmentation (at various spatial scales). Derived vegetation indices are easy to implement and are commonly used as indicators of the relative abundance and photosynthetic activity of vegetation. The ability to remotely measure an attribute of interest is dependent upon the process scale of that attribute and the spatial resolution of the sensor being used. The full range of vegetation field variables suitable for capture via passive remote sensing is still not well understood.
The case study investigated several different vegetation indices, ranging from commonly used indices such as the Normalized Difference Vegetation Index (NDVI) (Rouse et al. 1974), soil adjusted indices (Huete 1988), as well as more complex indices such as those produced from the tasselled cap calculations or tailored to specific satellites (Pinty & Verstraete 1992), and their relationships with field-based data and the modelled vegetation condition. However, problems of spatial misalignment between datasets and the potential mixing of spectral values within pixels meant results were often unreliable.
The mapping and monitoring of native vegetation condition is a rapidly developing research area that is presently receiving much attention at state and catchment level in Australia. Remote sensing plays a vital role in any monitoring regime as it alone affords a total sample of the landscape. However, the utility of remotely sensed data for native vegetation condition assessment is highly dependent on the sampling scheme used for the collection of ground data. Issues of poor compatibility between field-collected data and remotely sensed data can present significant problems for analysis between the datasets.
One solution is to develop a consistent set of guidelines by which field-based data are collected and documented. In doing so, current and future users will be in a better position to understand the potential for integrating field data with other data sources such as remotely sensed data. This would improve the overall capacity for field data to be re-used with other datasets and sources. This may prove cost-effective in the longer term.
The sudden growth of inexpensive or freely available satellite imagery presents environmental scientists and land managers with an opportunity to participate in multiscale native vegetation mapping and monitoring. It is therefore timely that data integration issues are identified and resolved to ensure a legacy of field-based data for use with these emerging sources of information. This paper encourages field scientists to engage in discussion to evolve the recommendations made here into working standards of practice relating to the field collection of vegetation and other biophysical data.
The authors would like to acknowledge the following people and institution without whom this research would have not been possible, Phil Gibbons (NSW Department of Environment and Conservation), Andre Zerger (CSIRO Sustainable Ecosystems), the Australian Research Council (Linkage Grant Number LP0455316). The authors would also like to acknowledge the two anonymous referees whose comments improved earlier versions of this manuscript.