Classifying types of gully changes with unoccupied aircraft vehicles 3D multitemporal point clouds for training of satellite data analysis in Northwest Namibia

The development of standardised data acquisition strategies and analytical workflows is crucial to quantify gully changes. In this study, we explore synergies between unoccupied aircraft vehicles (UAV) and satellite remote sensing in order to classify gully morphodynamics. Using Time Series Forest (TSF) and the Sentinel‐1 radar backscatter coefficient (σo), gully scenarios can be classified into four categories: gully topographical change, no change outside gully, no change inside gully, and non‐topographical change. In addition, a Random Forest (RF) classification is performed employing individual features obtained from elevation models and temporally aggregated datasets. Training data are generated from multitemporal UAV‐borne photogrammetric point clouds through a manual segmentation of different gullies in Namibia. This information is transferred from point clouds (sub‐m) to satellite imagery (10 m) generating training data at Sentinel‐1 pixel level. Results indicate that the TSF (on the σo Vertical‐Vertical polarisation) and RF (on temporally aggregated features) perform best when training and testing areas are located within the same geographical extent. Both approaches yield similar Total Accuracy (TA ≈ 79%–80%) and Cohen's Kappa value (Kappa ≈ 0.7), but TSF achieves superior Producer Accuracy (PA = 78.5%) and User Accuracy (UA = 84.6%) for the gully topographical change class. Additionally, the utilisation of TSF in Vertical‐Vertical polarisation is the most effective method if the testing and training areas are in different geographical locations, allowing gully identification with TA > 80% and Kappa = 0.49. However, this method presents limitations to precisely delineate the change types, as dynamics are rain‐driven and therefore are geographically related. In summary, by combining the complementary benefits of UAV‐based and satellite‐based solutions, this study opens a line of research for the study and classification of surface land dynamics and geomorphological feature extraction in regional extents.


| INTRODUCTION
Gullies are dynamic geomorphological forms that accelerate land degradation and lead to morphological and environmental changes in the landscape at the micro and macro geographical scale (Montanarella et al., 2015).Their study has been approached from an increasing number of scientific perspectives, spatial scales and geographic regions (Bennett & Wells, 2019;Castillo & G omez, 2016).Some remarkable examples of long-term joint efforts are the studies conducted in Australia to map (Shahabi et al., 2019), assess the impact (Hancock & Evans, 2010) and remediate inland gully erosion (Wearne et al., 2018).In China, various initiatives are mainly focused on developing gully control practices in agricultural lands (Guo et al., 2020;Liu et al., 2019).Furthermore, extensive research focuses on the development of empirical models to map gully heads at a global and continental scale (Omran et al., 2022;Sidorchuk, 2021;Vanmaercke et al., 2020).Identifying and measuring gullies in large extents at a given time in 2D has been addressed by various authors (Golosov et al., 2018;Shahabi et al., 2019;Vallejo-Orti et al., 2021).Nevertheless, certain aspects remain understudied and relevant, such as the temporal dimension of gullies and techniques to measure their evolution over long periods, that is, >5 years (Hayas et al., 2017), and at regional scales (Vanmaercke et al., 2021), that is, >1000 ha (Kimura et al., 2009;Morgan, 2005).These approaches would provide geomorphologists, regional planners, and agronomists with useful information to better understand triggering factors, gully-human interactions, and to plan remediation strategies (Vanmaercke et al., 2021) as well as to calibrate and evaluate empirical methods (Omran et al., 2022;Sidorchuk, 2021).
Previous studies show that 3D monitoring of the geomorphic activity of gullies is typically based on pairwise change quantification between two epochs.Terrestrial (Zhong et al., 2021) and airborne (Du et al., 2021) laser scanning as well as UAV-borne photogrammetrybased dense image matching (Nota et al., 2022) are well-established close-range sensing techniques for generating these 3D datasets.
Their use is valuable due to the achieved level of detail (sub-cm); however, they are limited to relatively small geographic areas (i.e., areas <5 ha).As gullies are complex systems where numerous types of processes occur (i.e., detachments, laminar erosion, vegetation changes and stable areas), it is essential to classify gully sections by their predominant type of activity.Using 3D close-range sensing techniques enables the detection and quantification of minute surface changes, as well as the extraction of 3D surface information.Consequently, although their application to large geographical areas is costly and time-consuming, they are of great value as potential training and reference data of surface dynamics for satellite-based solutions.
As scalable and transferable methods are desired to cover large areas and longer time periods, satellite observation systems with global coverage, all-weather performance, high-frequent re-visits need to be considered.In addition, the selected sensors need to capture distinguishing features of gullies, such as terrain ruggedness, soil compaction, water balance, and vegetation type and distribution.Synthetic Aperture Radar (SAR) Sentinel-1 (S1) offers adaptability to observe these features and fulfil the aforementioned requirements.
As one of SAR direct features, backscatter coefficient (σ o ) is the normalised measure of the reflective strength of a radar target defined as per unit area on the ground.σ o provides information about soil physical and chemical features and vegetation status.Thus, a spatio-temporal analysis of σ o reveals insights into land surface changes for much larger geographic areas and offers short intervals between acquisitions (Sentinel-1 SAR User Guide, 2021).The detection of earth surface changes using SAR is widespread, for example, to monitor snow landscapes changes (Snapir et al., 2019), dune dynamics (Rozenstein et al., 2016), land subsidence (Canova et al., 2012;Pu et al., 2022), river (Ahmad & Kim, 2019) and soil moisture dynamics (Ullmann et al., 2023).
Within this context, we investigate how four types of dynamics (gully topographical change, no change outside gully, no change inside gully and non-topographical change) identified in point clouds can be classified in satellite datasets.The Time Series Forest (Deng et al., 2013), and the Random Forest (Ho, 1995) algorithms are applied on radar time series and on 11 single features, respectively.We apply and test our approach in the Kunene Region in Namibia over the course of one and a half years encompassing the rainy seasons of 2019-2020 and 2020-2021.During this period training data are generated, firstly from the same area, and secondly from a different gully network located in the Krumhuk Farm, in central Namibia.
In this line, the primary objectives of this research are (i) to develop an approach for transferring information from very highresolution 3D UAV data to an operational scale in space and time, based on spaceborne Earth observation; (ii) to study whether the use of the temporal signature of a single radar variable (σ oVV or σ oVH ) can replace or improve the use of multivariate analysis based on elevation models and temporally aggregated features; (iii) to explore the geographic transferability of the methods to determine the viability of using a training gully site distinct from the classification target one.

| Study areas
The first study area is in the Kunene Region in Namibia, 15 km towards the south of Opuwo (Figure 1a,b), covering part of the Okatjandja Kozomenje conservancy.Our second study area is in the Krumhuk Farm, in central Namibia, located approximately 30 km from Windhoek (national capital city) towards the south.One of the major gullies in this farm is selected as alternative training area (Figure 1a,c).Both study sites are representative of large active valley bottom gullies in (semi-) arid areas, typical in Namibia (Mendelsohn et al., 2002), South Africa (le Roux et al., 2022;Olivier et al., 2023) and other regions (Shahabi et al., 2019).
Between November 2019 and March 2021, four UAV data collection campaigns were conducted in the Kunene Region and eight in the Krumhuk Farm.The extents covered during these campaigns are framed by a blue polygon in Figure 1b,c.
A summary of both study area descriptions is presented in Table 1.
As both study sites present similarities in terms of soil and climate, the main differences are identified in the gully morphology.The gully in Kunene is larger, with dimensions of up to 300 m wide and 15 m deep as compared to the 50 and 5 m in width and depth of the gully in Krumhuk (Figures 1d,e and 2).Krumhuk also has denser grass coverage during the rainy season.
Typically, as seen in the gully site in Kunene Region (Figure 3), the outer limits of the gully are the most active zones, where ero-

| Datasets
The contents of Table 2 outline the characteristics, derivatives and acquisition times of the source datasets utilised in this research.
In Sections 2.2.1-2.2.3, we elaborate on the specifics of data collection, pre-processing and the various datasets preparation for the subsequent analysis.
As a derivate of the point clouds, RGB (Red, Green, Blue) orthoimages and DEMs are also generated and processed in R (R, 2021) 'raster' package to create Terrain Ruggedness Index (TRI), Slope (S) and Green Leaf Index (GLI) (Louhaichi et al., 2001).TRI is the resultant square root of the squared and averaged elevation difference between the centre pixel and its eight adjacent pixels of a moving kernel (Riley et al., 1999).It represents local surface spatial variability.
The GLI uses the RGB channels of an optical image to extract information of the vegetation status, ranging from non-living (negative GLI) to green vegetation (positive GLI), according to Expression (1).Recent studies on the geolocation accuracy of SAR mission products (Gisinger et al., 2020;Small & Schubert, 2022) indicate that S1 registers absolute location errors (ALE) ranging from sub-m to approximately 3 m, which fits the requirements for our analysis.Vegetation Index (NDVI) (Rouse et al., 1973), to derive vegetation change (ΔNDVI) and average status μ (NDVI).

| Single and temporally aggregated features
The characteristic of each dataset is presented in Table 3:

| METHODS
The proposed approach aims to classify types of activity at satellite spatial resolution in a gully zone using point clouds as training data (Figure 4).Aligned with the dynamics depicted in Figure 3 These classes are manually segmented as polygons in the point clouds by experts in gully erosion.They are subsequently interpolated to the whole point cloud using a Random Forest (Ho, 1995) and upscaled to S1 as training pixels.To classify gully change at satellite pixel level, a time series classifier, Time Series Forest (TSF) is evaluated and compared to a Random Forest (RF) approach using temporally aggregated features (see Table 3) as a proxy of type of change.
The developed workflow is presented in detail in Figure 5, and the different procedures are explained in Sections 3.1-3.4.

| Generation of UAV reference data
The process of manually identifying the four classes involves the expert-based observation of differences between two distinct RGB point clouds.This identification is facilitated by the M3C2 distance (M3C2) (Lague et al., 2013)  To derive a wall-to-wall reference data, the manually labelled UAV training data is used to conduct the RF classification applied to the whole point cloud resulting from the M3C2 distance operation.
The same number of sampled points is selected for each segmented class, split 80% for training and 20% for testing.Five explanatory variables are used and extracted from the UAV-derived data point clouds: (1) M3C2, (2) SC, (3) S, (4) TRI and (5) GLI.The RF classification generates a point cloud (UAV-predicted reference data) where each point is assigned to a class with a prediction probability.

| Generation of satellite pixel training data
To solve the resolution mismatch between UAV derived point cloud and satellite (S1) product, we transferred the information from the point clouds (0.1 m) to the spatial pixel resolution and extent of S1 (10 m). Figure 7 depicts the component of the methodology dealing with the information resampling from point clouds to satellite pixels.
For this purpose, zonal statistics are implemented for each S1 pixel coverage through the calculation of a purity weighted index (q c ).
q c ranges from 0 to 1 and indicates the degree of dominance (high q c = 1 or low q c = 0 dominance) of each class c.As presented in Expression (2), p c is the number of points of a certain class within a pixel, p c is the average probability of these points of belonging to this class c resultant of the RF classification, p T is the total number of points inside the same pixel and p T is the average probability of all points of belonging to their assigned class.Therefore, the following expression is calculated four times for each pixel, generating q 1 , q 2 , q 3 and q 4 indices.
Thus, a pixel that hypothetically contains only points of C1, will obtain q 1 = 1, regardless of the individual probability associated to each point.
Lastly, to ensure sufficient point coverage within each pixel for further analysis, a pixel must encompass a quantity of points surpassing half of the median number of points found in all pixels.Consequently, pixels failing to meet this criterion are excluded, leading to gaps in the dataset, as illustrated in Figure 8e.The TSF is defined with 100 trees and 200 moving windows, as results become optimal and stable beyond this threshold (Figure 9f).
The same training data is used to fit a model based on RF (constructed with 100 trees) with 11 features (listed in the Table 3)

| Evaluation of classification accuracy
As previously mentioned, we assess the accuracy of the classifications The TA expresses the percentage of well-classified pixels among all classified pixels.Kappa coefficient is a statistic that is used to measure inter-rate reliability for qualitative (categorical) items (Cohen, 1960).
The PA represents the probability of a reference pixel (on the ground) being accurately classified.It is calculated by dividing the number of correctly classified pixels of a specific class by the total number of reference pixels belonging to that class (Congalton, 1991).
The UA is the probability that a classified pixel on the map accurately represents its corresponding category on the ground.It is calculated by dividing the number of correctly classified pixels of a specific class by the total number of classified pixels for that class on the map (Story & Congalton, 1986).
Additionally, possible sources of errors are also investigated aiming at evaluating the limitations of the methodology through the application of a statistical MANOVA test (Stahle, 1990) for comparing multivariate sample means.

| UAV training data generation
The RF classification applied on the UAV point cloud (Figure 8d) achieves a TA = 99% and Kappa = 0.98, with UA and PA for each class above 98% (Table 4).As the objective is to create exhaustive continuous data in the point cloud using the RF classifier as an interpolator, external testing datasets are not used to validate the accuracy.The most outstanding features for this classification are TRI and GLI, both presenting RF feature importance of approximately 25%.

| Satellite remote sensing-based classifications
Results for the pixel classifications and their corresponding accuracy reports are presented in Figure 9b-d and Table 5.
The resulting accuracy values are similar for TSF (σ oVV ), TSF (σ oVH ) and RF when the proportion between training and testing data is 80/20, with TA close to 80% and Cohen's Kappa values around 0.7.
They all classify changing scenarios with UA and PA always > 50%.
It is noteworthy that for TSF, C1 is classified with UA > 80% and PA > 78% in both polarisations, but the σ oVV dataset poses superior capacity to classify C2, C3 and C4 than σ oVH .In terms of feature importance, mean (35%), standard deviation (33%) and slope (32%) of the different changing windows are well balanced in both polarisations.
In the RF classifier utilising 11 individual features as shown in Figure 9d, the performance is slightly superior to the best results obtained using the entire time series discussed earlier.This yields and the features reflecting modifications in the photosynthetic activity (μ [NDVI] and ΔNDVI).
Moreover, the experiment comparing the TSF and RF performances evaluating different portions of training data from 0.1/0.9 to 0.9/0.1 reveals that the TA ranges from 60% to 80% for RF, whereas the TSF-based solutions present a significant decrease in performance (TA) when training data is below 50%.

| Geographic transferability
Here we present classification outcomes for the same testing area in the Kunene Region as in Section 4.2.However, the training process is conducted in a different gully situated within the Krumhuk Farm (Figure 10 and Table 6).
As observed in Figure 10c-e and demonstrates that our approach can be used to identify the presence of gullies even when training the classifier in remote and different gully areas.In this case, TSF (σ oVV ) presents the best accuracy results, with a TA = 80% and Kappa value close to 0.5.

| Type of change class description
In order to investigate error sources, the origin of False Positives     mapping of gullies at a specific time using RF (Vallejo-Orti et al., 2021) and adding value to the quantification of changes to the gully perimeter in large periods (Shruthi et al., 2015).
Additionally, our method allows the estimation of terrain change volume by utilising the average vertical topographic deviation per pixel (derived from the M3C2 distance in their enclosed points), as reported in Table 7.This facilitates extrapolations to large areas.For instance, in Figure 9a, there are 2084 pixels classified as C1 in the entire target classification area (approximately 140 ha), with an average topographic change of À0.2 m per pixel.This results in a volume of 21.6 m 3 per pixel and a total of 45,014 m 3 for the entire study area.
Our method, based on remote measurements, applied annually can also be used to evaluate and calibrate empirical gully evolution (Sidorchuk, 2021) and simulation models (Omran et al., 2022) involving water streams.Thus, erosion rates predicted by these models can be assessed at the catchment and sub-catchment scale, facilitating the calibration of coefficients and sensitive parameters such as rainfall, stream velocity and discharge.

| Geographical transferability
As observed, σ o time series provide a more accurate representation of the dynamics inside and outside the gully, compared to temporally aggregated features.This is particularly relevant when the gullies used for training and validation differ in their location, vegetation structure and rainfall distribution.To separate gully and non-gully zones, even when training and classification are conducted in geographically distinct areas, the use of time series proves to be an effective method.In this instance, TSF applied on σ oVV overperforms TSF applied to σ oVH and RF.
These results are in line with previous research on storm-derived change detection, whith σ oVV identified as the most sensitive polarisation for monitoring land degradation in sparsely vegetated zones (Cerbelaud et al., 2021).Additionally, working on time series, there is room for improvement in exploring the combination of both polarisation signatures (σ oVV , σ oVH ) applying multivariable time series algorithm (Ruiz et al., 2021) or generating cross-polarised time series for soil moisture studies (Ouellette et al., 2017).

| Spatial resampling from point clouds to satellite
As expected, pixels with less mixture tend to be more accurately classified.This is due to the large coexistence of different dynamics primarily within the gully borders (i.e., topographic changes, vegetation changes and stable sites), which is the primary source of classification error.This fact indicates that we arrived at the limit of the spatial resolution of S1 remote sensing for small gully change patterns, suggesting the applicability of this study to large gully changes.
Our results also indicate that in our target classification areas, topographic changes (Class 1) are closely linked to the presence of green vegetation (Class 4).This is because ephemeral water flows are associated with soil wetness that favours permanent vegetation, while simultaneously acting as the agent of gully erosion.
The availability and usage of SAR multitemporal imagery with higher spatial resolution (i.e., 1 m) would lead to significant uncertainty reduction in the classes' generalisation and resampling.To ensure better results, working with those areas where dynamics are in their purest states is critical to discriminate pixels where dynamics coexist.In this regard, filtering the result in the classification pixels with >50% probability leads to significant improvement in the classifications.Due to the correlation between pixel class probability, MAN-OVA test statistic (F MANOVA ) and accuracy, the MANOVA test applied at pixel level can also reduce error propagation.

| Limitations
Main limitations of this study arise from the utilisation of diverse datasets with differing spatial resolutions and horizontal alignment.
Consequently, a certain level of uncertainty must be acknowledged due to the geolocation accuracy of the generated point clouds and S1.
The absolute horizontal geolocation error is approximately 0.5 m for the computed point clouds and 1 m to 3 for S1 (Small & Schubert, 2022).Thus, sufficient spatial overlap for the given spatial pixel resolution of 10 Â 10 m is guaranteed for our spatial analysis and it is also sufficient for time series analysis (Small & Schubert, 2022).et al., 2021).Additionally, these methods can address potential underfitting problems derived from classical machine learning when they are used for generalisation.This would save time and enable scaling of our approach, such as potentially for global applications (Vanmaercke et al., 2021).Micro-mapping collaborative approaches (Herfort et al., 2018) can help to create reliable segmented classes and reduce uncertainties and issues related to class generalisation.
As previously outlined, our approach can classify gullies when the training and testing datasets are geographically distant.However, it

| CONCLUSION
This research explores the combination between close-range UAV and spaceborne remote sensing as a novel application for classifying gully activity types.This study contributes to addressing the existing research gap in gully monitoring and characterisation over large areas.To date, few approaches have attempted to monitor gully changes with spaceborne sensors.This is primarily due to the significant amount of manual labour required to generate ground reference data for temporal changes, particularly on large scales (Vanmaercke et al., 2021).In order to characterise the dynamics of gullies, we collected and processed 3D In general, our approach represents an improvement in terms of the type of information that can be extracted from gullies, complementing the line of separating affected from unaffected areas in a binary mapping task.In this regard, annual systematic application of our method in large gully affected catchment areas can serve as a decision-making tool for the planning and assessment of gully restoration campaigns at a large scale (regional level).For instance, to quantify the effectiveness of soilfixing plantations, or temporary zonal restrictions on cattle.
sive linear advancement driven by concentrated water flows and local detachments in the most vertical walls are frequent ([1] gully topographical changes).An intermediate zone with gentle slope and sparse vegetation is characterised by a more stable erosion area inside the gully ([3] no change inside gully).Permanent woody vegetation is identified as the primary non-topographic elevation change in the area ([4] non-topographic change) in the main channel's central gully sections.Finally, non-sloped areas in the gully surroundings have not been affected by gully erosion ([2] no change outside gully).These generalised types of changes are used as the foundation to describe the four target classes in subsequent sections.
General map of Namibia locating the two study areas.(b) Gully in the Kunene region with the target classification area (green polygon), the UAV training/testing area (blue polygon), the gully outline (black line), the main road (dashed black line) and some houses (yellow dots) (Namibia Statistical Agency, 2021).(c) Gully in Krumhuk farm displaying the UAV training area (blue polygon) and the gully outline (black line).Terrain topographic profiles across gullies extracted from Tandem-X DEM (DLR, 2014) in the (d) Kunene region and (e) Krumhuk farm.Base maps source is ESRI, Maxar earthstar Geographics.[Color figure can be viewed at wileyonlinelibrary.com] camera (DJI, 2022).Aerial imagery from two campaigns (November 2019 and March 2021) for each study area is selected on the dates indicated in

2. 5
cm between the GCP and the point clouds.The multitemporal point clouds are aligned by minimising the 3D distance between these point clouds in stable areas outside the gully using an iterative closest point (ICP) algorithm(Besl & McKay, 1992), detecting a mean registration error (root-mean-square error) of 0.1 m.Subsequently, M3C2 (linear multi-scale) distance and significant change are derived(Lague et al., 2013).We used an ICP and M3C2 algorithm implementation in Cloud Compare(Cloud Compare, 2022).M3C2 computes distances between point clouds based on a local normal vector along which a cylinder is oriented.This cylinder captures points from each point cloud, which are averaged (separately for each point cloud) and compared to derive a linear multi-scale distance(DiFrancesco et al., 2020;Zahs et al., 2022).The registration error (derived from the ICP) and the local roughness of each point cloud determines the level of detection (LoD), which is a measure of uncertainty associated to each change, as defined in Lague et al. (2013).Although LoD is spatially variable, it presents a mean = 0.70 m and std.dev = 0.19 m for the entire point cloud, resultant of the M3C2 operation.As a final derivate, the significant change (SC) parameter indicates whether the M3C2 distance computed for a certain point corresponds to a real change (M3C2 distance > LoD) or not (M3C2 distance ≤ LoD).Thus, F I G U R E 2 Representative descriptive photos of both study sites.(a) General view, (b) interior perspective and (c) close-up of a gully wall in the Kunene region.(d) Overall view, (e) interior perspective and (f) close-up of a gully wall at the Krumhuk farm.[Color figure can be viewed at wileyonlinelibrary.com]F I G U R E 3 (a) 3D point cloud of the gully site in the Kunene region with the approximated gully outline (in black line) and the four generalised identified changing scenarios ([1] gully topographical change, [2] no change outside the gully, [3] no change inside the gully and [4] nontopographical change).(b) Aerial drone image locating the different change types, displayed in detail in subfigures (c)-(f) for each class of change (1, 2, 3 and 4), respectively.[Color figure can be viewed at wileyonlinelibrary.com]T A B L E 2 Description of the main data sources.The columns display, from left to right, the name of the original dataset, products derivates, ground sampling distance (GSD) and acquisition dates (grouped by years appearing in bold, with dates in the format month.day).
2.2.2 | Radar backscatter time series S1-GRD (Ground Range Detected) products (ESA, 2021a) are used to derive σ oVV and σ oVH for the full time series between July 2019 and April 2021.To generate σ o from S1-GRD imagery, pre-processing operations are carried out in ESA SNAP software (ESA, 2021c), including orbit correction, radiometric and geometric calibration.Global TanDEM-X DEM (DLR, 2014) is used to apply geometric corrections to S1-GRD.These operations are repeated for each epoch (relative orbit 29, frame 1106 ascending) generating time series of σ oVV and σ oVH images.These images time series (σ oVV and σ oVH ) are cropped to the extent of our study area and projected to UTM Zone 33 South, WGS 1984 ellipsoid (EPSG:32733).
Utilising the σ oVV and σ oVH time series, the rate between the first and last values of the series (Δσ oVV and Δσ oVH ), and the standard deviation for the entire series (SD [σ oVV ] and SD [σ oVH ]) are computed as temporally aggregated values for each pixel.Three classes reflecting dominant backscatter mechanism (volume, double-bounce and surface backscatter) are derived from S1-SLC (Single Look Complex) products (ESA, 2021a) conducting a H-Alpha Wishart dual polarisation unsupervised classification (Lee et al., 1999) in ESA SNAP software.This results in the dominant scattering mechanism for Nov 2019 (SM 2019 ) and March 2021 (SM 2021 ), as well as the computed pixel-based binary change jΔSMj.TRI and S are also computed from Tandem-X DEM in R (R, 2021) 'raster' package.Sentinel-2 (S2) (ESA, 2021b) datasets for November 2019 and March 2021 are used to produce the Normalized Difference , the four target classes are: Class 1 (gully topographical change; C1), Class 2 (no change outside gully; C2), Class 3 (no change inside gully; C3) and Class 4 (non-topographical change; C4).
and associated significant change (SC) between November 2019 and March 2021.Polygons for each class are manually segmented.SC = 1 determines changing areas, which are separated into gully topographical change (C1) or nontopographical change (C4) with support of RGB information.No changing sites (SC = 0) can be separated between those inside (C3) or outside the gully (C2).In Figure 6a-d we show framed areas enclosing examples of points of each class.Figure 6e presents a profile line showing the typical respective topographic changes, with neutral change in C2 and C3, negative change in C1 and positive in C4.
T A B L E 3 Summary of the characteristics of the 11 features utilised for random Forest classification at the pixel level.The columns present, from left to right, the variable name, the variable unit, brief description, source product and an indication of whether they are temporally aggregated or one-time features.Δσ oVV (13) Δσ oVH dB Rate of σ o between November 2019 and March 2021 for VV and VH polarizations Sentinel-1 GRD YES (14) SD (σ oVV ) (15) SD (σ oVH ) dB The standard deviation for the time series of VV and μ (NDVI) No units (À1 to 1) The mean for NDVI for the time series between November 2019 and March 2021.Sentinel-2 MSIL2A YES F I G U R E 4 Conceptual explanation for the proposed approach.Types of changes ([1] gully topographical change, [2] no change outside the gully, [3] no change inside the gully and [4] nontopographical change) are identified by humans (human interpretation) on the original and on the already analysed multitemporal point clouds (multitemporal 3D data) and transferred to Sentinel-1 pixel scale to apply pixel-based classifications (satellite pixel-based analysis).The source of the icons used in this figure is stock.adobe.com.[Color figure can be viewed at wileyonlinelibrary.com] 3.3 | Satellite pixel-based classifications We conduct a new classification extrapolated to the entire study area based on S1-σ o time series and TSF algorithm.Features such as the mean, standard deviation, and slope of σ o time series are extracted in different moving windows.These features are used to configure the TSF classifier, with the total number of features resultant from the product of the number of simple features multiplied by the predefined number of windows (Deng et al., 2013).The number of training samples is set to the least represented class, selecting those pixels with highest q c in the rest of the classes.σ o time series (50-time epochs) for the Vertical-Vertical (VV) and Vertical-Horizontal (VH) polarisations are utilised as explanatory features.This leads to two independent classifications of the TSF algorithm (Figure 9b,c) where, as per standard procedure, 80% of the samples are used for training the classifier and 20% are reserved for testing the results.Additionally, different training/ testing proportions are computed to observe the performance of the algorithms with different training datasets sizes (Figure 9e).

5
Description of the methodology.Each frame corresponds to the methodological block, beginning with the pre-processing of the time series and temporally aggregated features, and followed by the (3.1) generation of UAV reference data, (3.2) generation of pixel training, (3.3) satellite-based classifications generating independent results for time series Forest (σ vv, and σ vh ) and random Forest and (3.4) evaluation of classification accuracy.[Color figure can be viewed at wileyonlinelibrary.com] derived from variables that can represent change in the terrain and/or distinguish different types of terrain (Figure 9d).
conducted at the point cloud and satellite pixel levels by splitting our reference data into training and testing datasets at an 80% to 20% proportion.For point clouds, the reference data includes manually tagged points as described in Section 3.1.On the other hand, for pixel-based classification, we utilise the resultant pixels outlined in Section 3.2 for training and validation.The performance of classifications is assessed using overall metrics such as Total Accuracy (TA) and Cohen's Kappa (Kappa), along with specific metrics for individual classes like Producer Accuracy (PA) and User Accuracy (UA).These metrics are calculated based on the counts of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN).

F
I G U R E 6 Azimuthal view of point clouds of a gully changing site captured in (a) November 2019 and (b) March 2021.Computed (c) M3C2 distance and (d) significant change between (a) and (b) with frames pointing out to each type of change class.(e) Elevation profiles for November 2019 (grey) and March 2021 (black) along a line crossing each predefined class.[Color figure can be viewed at wileyonlinelibrary.com]Transferring this information from point cloud level to S1 satellite level (Figure 8d,e) generates 1321 pixels (88, 413, 740 and 80 pixels for C1, C2, C3 and C4 respectively), together with their corresponding purity weighted index (q c ).

(
FP) and True Negatives (TN) conducted in the TSF (σ oVV ) classification for the Kunene Region (Figure 9b), we examine the points-F I G U R E 7 Graphical representation of the resampling process of the four classes ([1] gully topographical change, [2] no change outside the gully, [3] no change inside the gully and [4] non-topographical change) denoting gully change types from point clouds to sentinel pixel resolutions.inside-pixels statistics generated during the data generalisation and classification process.To this end, a comparative analysis is performed at point cloud level to quantify the degree of mixture of each class and the basic statistics of each explanatory variable.Table 7 presents each class statistics for the initial training classes defined manually in the 3D point clouds (UAV Training), the RF results at the point cloud level (UAV Pred), the training classes generated at the S1 pixel level (SAR Train), the final prediction derived from TSF (σ oVV ) dataset (TSF SAR Pred), and these results once pixels with > 50% probability of the resultant class are selected (TSF SAR Pred > 50%).C1 (gully topographical change) is characterised by a negative M3C2 and high significant change (SC > 0.9), TRI and S above average, and with the absence of vegetation as indicated by low GLI.It is also understood that the first two parameters (SC and M3C2) serve to differentiate C1 (gully topographical change) from C3 (no change inside gully).

F
Figure 12h presents the main channel with erosive activity (C1), surrounded by slopes with little change and vegetation (C4).Hence, when the level of mixture is reduced and a specific change dominates, the likelihood of accurately predicting this type of change increases.
DiscussionIn order to explore the potential of machine learning coupled with time series to characterise gully dynamics, we compared satellite pixel-based classifications of TSF on S1-σ o time series with RF on 11 temporally aggregated and one-time features (derived from S1, S2 and TX).
5.1.1 | Radar time series versus aggregated featuresClassification accuracy shows TA and Kappa around 80% and 0.7, respectively, for both the TSF classification (σ oVV and σ oVH ) and RF.UA and PA to classify gully topographic changes (C1) are observed to be more accurate working with TSF than RF.However, the latter is less sensitive to modifications in the proportion of training to testing data.These results represent an advance in the zonal characterisation and monitoring of gullies(Vanmaercke et al., 2021), surpassing theF I G U R E 1 0 (a)Gully site in the Kunene region used for testing (red polygon) when the training was conducted in Krumhuk farm.(b) Gully site in the Krumhuk used for training (blue polygon).Maps of classification results of four classes separating training and testing areas for (c) time series Forest-TSF (σ oVV ), (d) TSF (σ oVH ) and (e) random Forest-RF.Maps of binary classification results (gully affected and no gully affected) separating training and testing areas for (f) TSF (σ oVV ), (g) TSF (σ oVH ) and (h) RF.The aggregation of classes 1, 3 and 4 is displayed with dark grey colour.Base maps source is ESRI, Maxar earthstar Geographics.[Color figure can be viewed at wileyonlinelibrary.com]

T A B L E 6
Accuracy results (UA, PA and TA in %) for the classifications conducted using TSF (σ oVV and σ oVH ), and RF on temporally aggregated features with training data generated in Krumhuk farm, and the testing in the Kunene region.The above block of the table presents the results for the four classes (C1 [gully topographical change], C2 [no change outside gully], C3 [no change inside gully] and C4 [non-topographical change]), while the below block presents the results aggregating classes 1,3, and 4 (C1 + C3 + C4) to work in a binary (gully and non-gully) classification.

F
I G U R E 1 1 (a) Density curves of explanatory variables separated by stage of the dataset organised per class.Each subplot includes multiple density curves for UAV training (black line), UAV Pred (dark grey line), SAR train (grey line), TSF SAR Pred (light grey line) and TSF SAR Pred > 0.5 (red line) of each point cloud explanatory feature (GLI, M3C2, S, TRI and SC) organised per columns.(b) Values of MANOVA F test statistic to quantify the similarity between each transformed dataset (UAV Pred, SAR train, TSF SAR Pred and TSF SAR Pred > 0.5) and the originally segmented classes (UAV training).[Color figure can be viewed at wileyonlinelibrary.com]T A B L E 8 Summary of the points-inside-pixels statistics for the different classification results for the Class 1, comparing the true positive (TP) with true negative (TN) pixels and the confusion between classes 1 and 3 (C1-C3).The columns points and points % present the count and the percentage of points from the interpolated UAV Pred contained by each Class 1 training pixel (px.).

F
I G U R E 1 2 (a-d) Histograms organised per classes displaying in the vertical axis the count of true positive (TP) classified pixels (in blue) and false negatives plus true negatives (FN + TN) pixels (in pink) versus the percentage of points per pixel of the corresponding class in the horizontal axis.Photos showing transition zones between (e) Class 2 and Class 3, (f) Class 1 and Class 4, (g) Class 1 and Class 2 and (h) Classes 1, 3 and 4 to illustrate typical mixture among classes leading to lower classification accuracies at pixel level.[Color figure can be viewed at wileyonlinelibrary.com] encounters challenges in accurately classifying gully topographic changes, possibly due to the different temporal rainfall distributions, which serve as the main driver of terrain changes.To ensure similar accuracies (i.e., TA = 80%) for the classification of change types, it is recommended to restrict the classification to the same region where the training is conducted, thus erosive rain events would be more aligned in time.This issue could also be resolved by generating TSF features based only on changing periods and correlating them to rain events.Regarding the geographical transferability of the methodology, it is assumed that similar classification results can be achieved when the training and classification is conducted on permanent large valley bottom gullies in (semi-) arid areas.In contrast, urban gullies, or ephemeral agricultural gullies would fall outside the scope of this work.
multitemporal point clouds to generate training data.The data was transferred from point clouds to spaceborne remote sensing images pixels entailing a necessary generalisation resampling from a sub-m to 10-m spatial resolution.As an innovation in gully research, erosion processes and geomorphological features are studied by means of time series of radar data, introducing the use of a time series classifier, Time Series Forest (TSF), to separate types of changes.Employing this method, TA > 80% are achieved to classify types of gully changes when the training is conducted within the geographic region of the target classification.This method also serves to differentiate between gully and no gully zones when the training data is conducted in a different geographical setting.However, it is observed that the resolution mismatch between the very high-resolution point clouds with Sentinel-1, and temporal misalignment of gully erosion events in different areas are two issues limiting the accuracy of the results and the geographic transferability of the training data.

Finally
, this paper suggests three directions for extending our approach to characterise large gully networks in sparsely populated areas in the future: (i) expanding the UAV data collection campaigns to other areas in order to investigate the relation between gully characteristics, local settings and geographic transferability of the training data, (ii) applying additional tests to optimise the aggregation from UAV points to satellite pixels and (iii) explore multivariate time series analysis to combine different radar polarisation and optical derivates as proxies for gully dynamics.
Descriptive parameters of the study areas.
T A B L E 1

Table 2 .
The flights are carried out at a constant altitude

Table 6
Accuracy results for the classifications conducted using time series Forest (σ oVV and σ oVH ) and random Forest on temporally aggregated features.User accuracy (UA%) and producer accuracy (PA%) are reported for each class (C1 [gully topographical change], C2 [no change outside gully], C3 [no change inside gully] and C4 [non-topographical change]).Total accuracy (TA%) and Cohen's kappa values are included as overall accuracy measures.
Summary statistics for each dataset at the different stages of the methodology (UAV training, UAV Pred, SAR train, TSF SAR Pred, TSF SAR Pred >50%).The labels P1-P4 indicate the number of points (from point clouds) of each class (C1-C4).The last five columns present the mean (μ) and the standard deviation (σ) of each point cloud original feature M3C2, SC, TRI, S and GLI.
the main limitations of this study is the availability of continuous wallto-wall training data at point cloud level.The adopted solution using RF to apply classes to the whole point cloud achieved extremely high accuracy values, which should be interpreted with caution, as the geographical transfer of the classifier cannot be guaranteed.For future applications, locally trained classifiers may be replaced by moreT A B L E 7