Considering spatiotemporal processes in big data analysis: Insights from remote sensing of land cover and land use

Data are increasingly spatio‐temporal—they are collected some‐where and at some‐time. The role of proximity in spatial process is well understood, but its value is much more uncertain for many temporal processes. Using the domain of land cover/land use (LCLU), this article asserts that analyses of big data should be grounded in understandings of underlying process. Processes exhibit behaviors over both space and time. Observations and measurements may or may not coincide with the process of interest. Identifying the presence or absence of a given process, for instance disentangling vegetation phenology from stress, requires data analysis to be informed by knowledge of the process characteristics and, critically, how these manifest themselves over the spatio‐temporal unit of analysis. Drawing from LCLU, we emphasize the need to identify process and consider process phase to quantify important signals associated with that process. The aim should be to link the seriality of the spatio‐temporal data to the phase of the process being considered. We elucidate on these points and opportunities for insights and leadership from the geographic community.

growing collections of observations (Laney, 2001) that are increasingly spatially and temporally referenced (Kitchin, 2013) and are changing the nature of data analysis (Brunsdon, 2016). We are moving from a time when all data are spatial to one when they are spatio-temporal (Kitchin, 2013)-collected at some-time as well as some-where.
Space-time relationships are at the core of GIS research (Yuan, 2017). While the role of proximity in spatial process is well known (Tobler, 1970(Tobler, , 2004, its value is much more uncertain for temporal processes. It works well in carbon dating, glottochronology (the study of language change over time; Lees, 1953), seriation in archaeology (Doran & Hodson, 1975), and many other disciplines. However, many processes exhibit specific periodicities which require synchronicity between the phase of process being observed and the timing of observations, and not just a measure of temporal proximity. Identifying temporal patterns (such as serial autocorrelation) requires temporal data to be treated in an informed way to ensure that the phase of observation (data) matches the periodicity of the process.
This article considers how the remote-sensing community have addressed the challenges and opportunities associated with big data (e.g., Ma et al., 2015) through the lens of land cover/land use (LCLU) mapping and land cover/land use change (LCLUC) analysis and monitoring. It highlights key developments in data and data infrastructure to accommodate big spatio-temporal remote-sensing data, that have in turn driven changes in operations and analysis. These have shifted the focus of analysis from recording LCLU states to quantifying LCLU processes and dynamics, such as change. They require algorithms and analysis procedures that explicitly seek to match the phase of observation with process periodicity. This article unpacks key developments in LCLU analysis of big spatio-temporal remote-sensing data and identifies a number of future directions for the wider geography and GIS communities in their analysis of big data.

| LCLU DATA AND CONTE X T
The concepts of land cover and land use are frequently rolled together in land inventories. Land cover describes terrestrial ecosystems, natural resources, habitats, and is an important input to climate models (see reviews by Cihlar, 2000;Comber, Fisher, & Wadsworth, 2005a;Franklin & Wulder, 2002). It can be determined by direct observation. Land use describes social, economic, and cultural utility (Turner, 1997) and ecosystem function (DeFries, Foley, & Asner, 2004). It requires the socioeconomic activities taking place on that surface to be interpreted.
While static maps of LCLU are common and have been historically important to establish baseline measurements, it is information about LCLU processes and dynamics that is of increasing interest (DeFries et al., 2002;Small & Sousa, 2016). That is, how have land cover and land use changed, with an interest in dynamics not solely status. This interest extends scientific understanding and informs policy development (NRC, 2014), whether through quantifying landscape ecological processes (Kennedy et al., 2014) or LCLU modeling to inform on carbon changes, for example (Sleeter et al., 2018).
Long runs of time series of satellite imagery allow dynamic LCLU processes to be characterized (Gómez, White, & Wulder, 2011;Hermosilla, Wulder, White, Coops, & Hobart, 2015;. Short-and long-term trends can be identified, allowing cyclical functions and feedbacks to be investigated, enhancing understanding of drivers from climate change to economic pressures (Kennedy et al., 2014;Turner, 1997). Understanding trends from such data allows future status or processes to be modeled and extrapolated (e.g., Dietze et al., 2018) and supports large-scale scenario simulations over long periods (e.g., , with attendant considerations of spatial structure in prediction (Li & Yeh, 2002).
Remote-sensing data are big (Guo, Wang, Chen, & Liang, 2014), with great variety as well as volume: from what is collected at sensors through to how data are provided to users, with differences in pixel size, spectral regions sampled, revisit rate, and so on. New analytical approaches for big remote-sensing data have been recommended (e.g., Ball, Anderson, & Chan, 2018;Liu, Di, Du, & Wang, 2018), particularly to support the ubiquitous challenge of and demand for real-time processing (Ma et al., 2015). One such development is the provision of analysis-ready data (ARD) suitable for analysis in large purpose-developed data cubes (e.g., Gorelick et al., 2017;Lewis et al., 2017). ARD are "data that have been processed to allow analysis with a minimum of additional user effort" (Dwyer et al., 2018(Dwyer et al., , p. 1365. They contain long, wide, and deep archives of time-series data (e.g., Landsat; Wulder et al., 2008Wulder et al., , 2016 collected under repeated spatial and temporal sampling frameworks. ARD are thus "stackable" and ensure that a given pixel represents the same physical ground location through time, allowing changes in condition to be captured . Changes at the pixel level can be related to temporal phase to quantify withinand between-year processes (e.g., fire, harvest, urban expansion, and other disturbances), or seasonal processes (e.g., snow, leaf-off). Longer-term processes (e.g., climate, soil degradation, chemical deposition) can also be manifest as a changing trend of a given property over time (Kennedy et al., 2014).
Large spatio-temporal cubes of ARD have changed the way that LCLU monitoring activities are undertaken.
In the past, LCLU monitoring was concerned with state (LCLU class) and state changes (LCLUC) rather than process. Now, there is much greater emphasis on process. LCLU inventories are updated rather than being periodically remapped through consideration of the spatial and temporal signals of change at a pixel level (spatial reach, temporal persistence, etc.). LCLU labels are only updated if the signal surpasses thresholds in both space and time (see Zhu & Woodcock, 2014). However, subtle signals, indicating changes in condition or quality but insufficient to warrant a change in label, are of great interest (e.g., long-term forest decline; Cohen et al., 2016). These may be indicative of some underlying process (Zhu, 2017) and can provide an early warning of potential future LCLU changes. Examples of an increasing focus on process in LCLU analyses are beginning to appear. In forest monitoring, Daniel, Frid, Sleeter, and Fortin (2016) proposed models based on discrete spatial and temporal units (i.e., state-and-transition models) as a means to forecast landscape change. Dolan et al. (2017) developed a vulnerability index to describe the capacity of forests to withstand accelerated disturbance dynamics arising from climate change.

| FROM LCLU PRO CE SS TO B I G S PATI O -TEMP OR AL DATA
The focus on process in LCLU analyses in response to the provision of big remote-sensing data and spatio-temporal ARD cubes offers potential insights for analyses of big spatio-temporal data more generally. In LCLU, long runs of serial and spatial data provide the ability to examine change processes at individual locations over time, with the pixel providing a convenient and consistent (if not natural) spatial unit over which to do this. Time series of images allow trends to be captured (Kennedy et al., 2014), whether using all available images (Zhu & Woodcock, 2014) or annual time steps (Hermosilla et al., 2016). In natural environments, for example, multiple measures at the same location at different points in time can detect within-year variations in phenology (Melaas, Friedl, & Zhu, 2013) and longer-term stresses between years . Such change processes can be characterized by a number of metrics (after Kennedy et al., 2014): (1) magintude; (2) duration; (3) intensity; (4) frequency; (5) periodicity; and (6) directionality. These single within-or between-year descriptors can be viewed independently or combined in order to infer process. For forest environments, a high-magnitude, short-duration, negative change is likely to be a fire or harvest; a low-magnitude, long-duration, negative change could be an indicator of drought stress; and a low-magnitude, long-duration, positive change may relate to growth of a mature forest. Such metrics have been used to infer the presence and rate of forest regeneration , as well as to classify disturbances to a type (Hermosilla et al., 2015). The ability to characterize condition change in this way is informed by when and where a disturbance has taken place, along with an understanding of different processes. Examples of typical processes considered by remote sensing are shown in Table 1.
There has been a step-change in LCLUC operations. Historically, and up until recently, LCLU mapping and change analyses were devoid of any temporal process, with change analysis through map-to-map comparison, with each LCLU map providing a static snapshot of environments that are inherently dynamic. Change analysis has shifted from a downstream by-product of mapping at different times, to now being the main objective of many wide-scale operational initiatives (Hermosilla, Wulder, White, Coops, & Hobart, 2018;Wulder, Coops, Roy, White, & Hermosilla, 2018). Baseline maps are updated only when the signal of change is sufficiently strong to indicate the presence of some recognizable spatio-temporal process. Work flows now embed the identification and characterization of change (i.e., process) as the first step in analyses (Gómez, White, & Wulder, 2016) of satellite data (e.g., Jin, Yang, Zhu, & Homer, 2017;Pengra, Gallant, Zhu, & Dahal, 2016;Wulder et al., 2018), aerial photography (e.g., Gauld, Bell, Towers, & Miller, 1991), and updates of historical thematic LCLU data (e.g., Comber et al., 2016). The result has been greater opportunities to uncover additional processes related to drivers of change with linkages to spatio-temporal context and the identification of more subtle, longer-term, trends (Kennedy et al., 2014).
LCLUC analyses now explicitly consider temporal processes not just changes in state. This has arisen because of factors such as ARD and open data policies Wulder & Coops, 2014), as well as a wider recognition of the difficulty in understanding temporal processes from static LCLU snapshots. Each LCLU map has a methodological and temporal vintage which is sometimes called an "ontology" (Comber, Fisher, & Wadsworth, 2005b). Ontologies are explicit specifications of an abstract representation of the world (Gruber, 1993;Guarino, 1995) like a map. In an LCLU mapping context they reflect choices over spatial, spectral, and radiometric data resolutions as well as the number and type of LCLU classes of the data. No vintage (or ontology) is ever the same because of the many embedded processes and assumptions (Comber et al., 2005b). Comparing LCLU maps in a post-classification change analysis is difficult (Fuller, Smith, & Devereux, 2003;Tewkesbury, Comber, Tate, Lamb, & Fisher, 2015) because any differences between them will reflect artefactual differences in ontology (Comber, Fisher, & Wadsworth, 2004), errors, and actual differences on the ground. TA B L E 1 Selection of types of forest changes to highlight variability in duration, spatial extent, rate, and magnitude of change Note: Individual-tree or object; Local-stand or watershed extent; Regional-multiple stands, watersheds.
So how does all of this relate to big data? We are in a world where ubiquitous spatially and temporally referenced digital data are generated, shared, and made available to anyone. Data volumes will continue to increase.
IBM reported in 2016 that 90% of all data had been created over the last two years, at a rate of 2.5 quintillion bytes of data a day (Loechner, 2016). These increases have been complemented by the generation and availability of large volumes of citizen science data, including those related to LCLU (See et al., 2016). Big data describing socioeconomic activities are typically used to understand social dynamics and environments (Liu et al., 2015).
But research describing the use of big data stretches across all domains-from instrumented farms in agriculture (Wolfert, Ge, Verdouw, & Bogaardt, 2017) to analyses of within-and between-group health inequalities (Kandt, 2013). See Gandomi and Haider (2015) for a review of data analytics techniques applied to different types of big data.
In contrast to LCLU data, big data are frequently not observed or collected for a specific analysis. Instead, they are repurposed to fit a particular need and they may have only weak relationships with the process being investigated (Gandomi & Haider, 2015). Despite this, big data allow spatial, temporal, and spatio-temporal processes to be identified, examined, and characterized across wide spatial scales and over increasingly long time periods (Harris et al., 2017). This is a step-change. Up until very recently (with a nod to data-sharing practices, cyberinfrastructures, and e-science), much scientific investigation was undertaken in the context of data paucity: limited data-availability, static single-date products that were periodically updated, data held in domain silos with access negotiated through a gatekeeper (Comber, Fisher, & Wadsworth, 2007), little data or product sharing, and ad hoc analytical approaches .
However, all this data access and the machine/deep-learning approaches engendered comes with caveats over the absence of process understanding or theory (Brunsdon, 2016;Lyon, 2014;Marcus, 2018). 1 In this context, the data trends in LCLU, and the consideration of process that they necessitate, offer lessons beyond the domain and across the spatial information sciences in how we approach our big data analyses.

| Lesson 1: Separating the signal of spatio-temporal process from the noise (you need to understand the temporal characteristics of processes to find them)
Spatio-temporal analyses of large datasets require an understanding of the processes being considered so that potentially meaningful signals related to these processes can be separated from the noise inherent in such data. LCLUC activities are concerned with the change target (is change present?) and change agent detection (what is the nature of the change, and does it matter?). Any suspected change has to surpass some condition (i.e., threshold, statistical boundary) for change to be recorded, as well as being subject to additional constraints such as a minimum mappable unit (e.g., White, Wulder, Hermosilla, Coops, & Hobart, 2017). Changes rejected for not being sufficiently robust in time and space may still contain important signals, indicating some underlying change processes, resulting in, for example, changes in quality and/or condition (Kennedy et al., 2014).
Remotely sensed data, especially when prepared as ARD, provide multiple pieces of information about the same location over long periods of time. Current time-series change monitoring (as reviewed in Zhu, 2017) uses long runs of satellite data (in particular, Landsat) that provide within-and between-year information, integrate change detection and LCLU mapping, and support all-available-data approaches for prediction of expected conditions (e.g., Zhu & Woodcock, 2014). For LCLUC analysis, where the emphasis is on monitoring and detection of change, this has shifted focus to consideration of the synchronicity (temporal alignment) between the temporal process and the data observation phase (the periodicity of repeated observations at a given location). Hitherto, phase was not considered extensively, and process was not detectable from the single snapshot of LCLU provided by remote-sensing data. They are now, and LCLU and change analyses seek to link the temporal properties of the data to the periodicity of the process via domain understanding. This enables consideration of the temporal persistence, spatial extent, and magnitude of process signals, for example related to changes in quality or condition, and meaningful signals to be identified. Operationally, these approaches reflect the position articulated by Miller and Goodchild (2015) of the need for methods to discriminate between spurious and meaningful spatio-temporal patterns.
The main point is that process understanding is used to confirm LCLU change: it has to have sufficient spatial and state dynamics captured over appropriate timeframes to be recorded. It is also used to identify more nuanced aspects of change (Zhu, 2017). Previously, LCLU change identification focused on depletions, punctual removals, and so on. However, changes in within-LCLU class condition now provide a deeper picture of the characteristics of dynamic processes. These include monitoring forest harvesting and wildfire (White et al., 2017), comparisons of regional rates of forest recovery (Frazier, Coops, Wulder, Hermosilla, & White, 2018), and capturing the changing states of forest dynamics (Gómez et al., 2011). Knowledge of process is increasingly used to inform land cover labeling . For instance, knowledge of time since disturbance informs on successional stage following harvest or wildfire, distinguishing between classes and allowing for the application of rules to promote logical transitions (Gómez et al., 2016). Within-year information can also be used to infer subclass processes, increasing the categorical depth of land cover mapping (Pasquarella, Holden, & Woodcock, 2018).
These approaches suggest how the dearth of transferable methods for predictive analyses of big spatial datasimultaneously sensitive to both spatial and temporal processes (Miller & Goodchild, 2015)-can be addressed. In part, these are due to a lack of natural unit of analysis for describing spatio-temporal phenomena. This suggests that any analysis of big data needs to identify a spatio-temporal unit of analysis. Time-series approaches to LCLU/ LCLUC have, by convenience and convention, utilized the pixel as the fundamental unit for considering processes.
Domain knowledge is used to understand how change processes manifest themselves over pixels of a given size and over a given timeframe. If individual records in big data are brought together over some form of areal unit (e.g., census areas, 10-km grid cells, etc.) to aggregate or link to other data, then some understanding of how the process under investigation manifests itself over that space is required.

| Lesson 2: Informed methods, tools, and techniques (if you don't know you need to find out)
Many commentators have identified the need for an expanded methods toolkit to handle spatio-temporal big data (e.g., Fotheringham, Crespo, & Yao, 2015;Goodchild, 2013;Miller & Goodchild, 2015). Spatio-temporal data can provide insights into the dynamics of underlying spatial process and can be used to generate predictive models.
There are a number of frameworks that have been used to model space-time dynamics and interactions in an evolutionary way, including cellular automata (e.g., Balzter, Braun, & Köhler, 1998;Dietzel, Herold, Hemphill, & Clarke, 2005) and agent-based models (e.g., O'Sullivan & Haklay, 2000;Torrens, Li, & Griffin, 2011). They develop emergent solutions but still require an understanding of how process spatio-temporal dynamics interact in order to parameterize them and are sensitive to initial settings and tuning. Critically, they can struggle to link fine-scale local spatio-temporal interactions with coarse-scale global dynamics (Chen, Han, Ye, & Li, 2011).
Some approaches explicitly seek to understand local spatial and temporal interactions, allowing for the possibility that space-time processes are heterogenous and not global or linear (Fotheringham et al., 2015;Kyriakidis & Journel, 1999). One of the key advantages of space-time models focusing on process heterogeneity rather than autocorrelation of inputs is that they are commonly better at handling spatio-temporal big data (Fotheringham et al., 2015;Goodchild, 2013). Existing statistical methods for analyzing space-time data can be grouped into five basic sets of approaches: autoregressive integrated moving average (ARIMA), space-time autoregressive (STARIMA), panel models, geostatistical approaches, and other models (Deng, Yang, & Liu, 2017;Griffith, 2010).
The first four are all concerned with capturing autocorrelation effects into the space-time models and may not be suited to analyses of big spatio-temporal data, because of the problem of statistical inference and significance testing (Brunsdon, 2017;Spicer & Gangloff, 2016). Others, such as geographically and temporally weighted regression (Fotheringham et al., 2015;Huang, Wu, & Barry, 2010;Liu, Lam, Wu, & Lam, 2018), focus on relationship heterogeneity where attribute relationships are viewed as nonstationary in both space and time.
In such cases, the nature of the temporal processes and how their phase is captured in the data need to be known. To discriminate between meaningful and spurious patterns it is critically important to understand the nature of the data (Miller & Goodchild, 2015) and the associated processes they describe (Brunsdon, 2017). What do you do if you don't have any process knowledge? In the absence of theory, temporal knowledge, and inference, machine learning and data mining are commonly employed in data science (Li, Ye, Lee, Gong, & Qin, 2017;Lv, Song, Basanta-Val, Steed, & Jo, 2017;Witten, Frank, Hall, & Pal, 2016). Such data-driven approaches can support both prediction and exploration. But they come with a health warning: they are context-and knowledge-free (Kitchin, 2013). They may detect patterns and provide answers to arbitrary questions (Harris et al., 2017), which although potentially able to generate novel insights, may lack an inferential dimension. This makes it difficult to link their results to process (and of limited or dubious utility) because of the uncertainty over which of the many patterns are linked to which of the many potential processes. However, they can support exploratory spatial data analysis, enabling discovery of unforeseen trends or to aid with hypothesis development (see Anselin, 1999). Here the aim is not to calibrate models or to test hypotheses but to use algorithms to detect patterns in data and thereby enhance and develop process understanding. However, this too needs to be done with care. Recent reflections on spatio-temporal analyses of big data have suggested an iterative sequence of investigation, theory development, domain experts, analytical refinement, and then deeper focus (view, identify, refine, zoom;Harris et al., 2017).
A potentially rich area of related further work lies in linking approaches to identify relationship spatio-temporal heterogeneity to deepen process understanding, especially over any given spatio-temporal unit of analysis.
This would go some way towards the need for new analytical toolkits for big data, as identified by many authors without specifying what they could be (e.g., Gandomi & Haider, 2015). Spatial models (e.g., Anselin, 1995;Brunsdon et al., 1996;Cliff & Ord, 1973;Ord & Getis, 1995) could be linked with temporal ones such as wavelet analysis (Torrence & Compo, 1998) and recurrent neural nets (Ermentrout, 1998). A further area is the extension of restricted maximum likelihood (REML) approaches (e.g., Welham, Cullis, Gogel, Gilmour, & Thompson, 2004) to include serial as well as spatial autocorrelation in the error term. To ensure an inferential dimension, such approaches could be supported by likelihood models based on Bayesian inference rather than frequentist statistical approaches. Models developed in this way can be evaluated using information criteria rather than significance tests. The widely applicable information criterion (Vehtari, Gelman, & Gabry, 2017)  and independence, and the problems with significance measures with big data (Spicer & Gangloff, 2016). Hybrid models have the potential to identify spatio-temporal structure and heterogeneity, revealing cycles hidden within the data and changes in spatio-temporal phase and amplitude.

| CON CLUS IONS
In this article we use the current state of the science in land cover/land use monitoring to make a number of arguments about analysis of big spatio-temporal data. The LCLU community have adopted their analysis protocols because of big remote-sensing data and have shifted from generating static snapshots of LCLU to explicit consideration of LCLU change and dynamics. These require domain understanding and knowledge of LCLU processes to be applied to be able to separate useful spatio-temporal signals. This shift has been driven largely by developments in data provision and access infrastructures rather than by design. The need for process understanding in analyses of big spatio-temporal data may be self-evident for some readers but not others. Our aim here is to inform wider practices related to geographic data analysis (supported by contextual and methodological lessons) and we argue that these need to establish the presence of synchronicity between the process being considered with the data observation phase. Process understanding is needed to do this and requires knowledge of how process characteristics manifest themselves over the spatio-temporal unit of analysis being applied. Big data is frequently repurposed from its original intended use, and these are critical considerations.
In summary, we offer the following observations.
1. In the future, all data will be "big" and "spatio-temporal" (forever), and as a consequence will simply be referred to as "data." 2. Data-driven analysis (data mining) without theory or an inferential dimension is insufficient because of uncertainty over which of the observed patterns are linked to which of the many potential processes.
3. Analysis of such data requires information and knowledge about spatio-temoral processes to be applied, including an understanding of how process phases are captured in the spatio-temporal data and over the spatio-temporal unit of analysis.
4. Where process understanding and knowledge are missing, iterative and reflective data mining can help to discover unforeseen trends in hypothesis development.
5. There are opportunities to link well-known models for handling space with those for handling (unknown) temporal phase.
Taking these arguments together, we suggest that essential principles for any big and spatio-temporal data analysis are to establish that the phase of the process of interest is embedded in the observations and its properties are understood over the spatio-temporal unit of analysis. We posit that geographical analyses of spatio-temporal data should link the seriality (i.e., measurement interval, data density) of the spatio-temporal data to the phase of the process being considered. This requires an understanding of how processes exhibit behaviors over both the spatial and temporal dimensions, with process understanding informed by when and where a measure is made, and that the temporal properties of any given process must be embedded in the phase of observation and measurement. Such considerations underpin spatio-temporal prediction, inference about process, trajectories and forecasting of potential future states. There are opportunities for insights and leadership from the geographic community to guide and inform the wider information sciences communities.

E N D N OTE
1 Similar criticisms were made in the 1970s with the advent of PCs and digital data access. Mather and Openshaw (1974) were concerned about the ability to simply crunch data, the "let the data speak" (Cukier & Mayer-Schoenberger, 2013) of the time, rather than hypothesis testing. Their description of it was a "mindless approach in which … variables characterized only by the fact that they are all easily culled from census volumes … are picked over like cans on a rubbish tip" (p. 290, emphasis added).