Spatiotemporal Data Augmentation of MODIS‐Landsat Water Bodies Using Adversarial Networks

With increasing demands for precise water resource management, there is a growing need for advanced techniques in mapping water bodies. The currently deployed satellites provide complementary data that are either of high spatial or high temporal resolutions. As a result, there is a clear trade‐off between space and time when considering a single data source. For the efficient monitoring of multiple environmental resources, various Earth science applications need data at high spatial and temporal resolutions. To address this need, many data fusion methods have been described in the literature, that rely on combining data snapshots from multiple sources. Traditional methods face limitations due to sensitivity to atmospheric disturbances and other environmental factors, resulting in noise, outliers, and missing data. This paper introduces Hydrological Generative Adversarial Network (Hydro‐GAN), a novel machine learning‐based method that utilizes modified GANs to enhance boundary accuracy when mapping low‐resolution MODIS data to high‐resolution Landsat‐8 images. We propose a new non‐saturating loss function for the Hydro‐GAN generator, which maximizes the log of discriminator probabilities to promote stable updates and aid convergence. By focusing on reducing squared differences between real and synthetic images, our approach enhances training stability and overall performance. We specifically focus on mapping water bodies using MODIS and Landsat‐8 imagery due to their relevance in water resource management tasks. Our experimental results demonstrate the effectiveness of Hydro‐GAN in generating high‐resolution water body maps, outperforming traditional methods in terms of boundary accuracy and overall quality.


Introduction
Water bodies monitoring is essential to guide evidence-based decision making which is necessary for hydrological and ecological sustainability (Njue et al., 2019).Surface water is an irreplaceable resource for ecological systems, human uses, industrial uses, hydro-power generation, social development, and recreation.Reliable information about the dynamic changes of open surface water (e.g., lakes, reservoirs, and rivers) is critically increase in the availability of satellite images, and image processing techniques, numerous research studies have attempted to extract and delineate water bodies from these images (Ouma & Tateishi, 2006).These technological and methodological advancements, shift the analysis of surface water bodies from regional-scale to global scale for a better understanding of the Earth's natural processes.
Earth observation data is acquired through a large number of satellites that have unique spatiotemporal resolutions making them complementary data sources (Amato et al., 2020).In the 50 years since the first satellite launch, remote sensing satellites have evolved from producing low-resolution images to daily data acquisition exceeding 10 terabytes (Campbell & Wynne, 2011).This transformation is driven by Earth observation science, with over 150 observation satellites in orbit, equipped with sensors operating at various spatial and temporal resolutions (Fu et al., 2020).
Due to the differences in sensor designs, there is often a trade-off in different spatiotemporal data resolutions across the remote sensing spectrum (Khandelwal, Karpatne, & Kumar, 2017).Figure 1 illustrates the spatiotemporal resolution of four active satellites which shows the aforementioned space and time trade-off.It can be noted that the highest spatial resolution satellite is compromising time resolution and vice versa.MODIS satellite data, obtained from both TERRA and AQUA instruments, observes the entire Earth's surface every 1-2 days.While the temporal resolution of individual MODIS observations is 8 days, due to the combined coverage from both TERRA and AQUA satellites, data from the same location on Earth is acquired approximately every 8 days.On the other hand, Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) sensors onboard the Landsat satellite capture the earth's surface every 16 days but at a high spatial resolution (HSR) of 30 m (refer Figure 1).In other words, We utilize MODIS data, which is available in various spatial resolutions, including 250 m, 500 m, and 1 km.For our study, we specifically employ MODIS data at a spatial resolution of 500 m.Additionally, Landsat data, captured by OLI with a spatial resolution of 30 m, is used in our research.TIRS on Landsat has a spatial resolution of 100 m.Given the limitation of individual sensors at either delivering HSR or high temporal cadence, it is essential to create interpolation techniques that can transmit information across spatiotemporal scales (Khandelwal, Karpatne, & Kumar, 2017).The knowledge transfer has the ability to equip the science communities with synthetic data sets that approximate real data on the go.For the case of water bodies interpolation, one of the desirable properties of the method is to achieve interpolated water bodies that are similar to the ground truth data in both shape and area.
High spatiotemporal resolution data will allow timely monitoring of the surface water and dynamics which are crucial elements for policy and decision-makers in hydrology and geomorphology (Maiti & Bhattacharya, 2009).Furthermore, interpolated data will facilitate the integration of remote sensing data with Geographic Information Systems for automatic or semiautomatic water body extraction and mapping (Sarp & Ozcelik, 2017).In the light of the aforementioned need for a flexible spatiotemporal resolution domain, in this paper, we propose a method that aims to learn a mapping between low spatial resolution (LSR) image instances and HSR image instances of water bodies and reservoirs collected across a period of 7 years from MODIS and Landsat satellites.The proposed mapping can be utilized to generate HSR data at times when only LSR data is available.Our contributions are as follows: • We designed an image processing pipeline that uses computer vision tools to extract target water bodies' polygon boundaries from satellite imagery.• We developed a new Generative Adversarial Network (GAN) for mapping the data between any LSR-HSR satellite pair.• We made our source code and data open-source in a project website (https://sites.google.com/view/hydro-ml/) that meets the principles of Findability, Accessibility, Interoperability, and Reusability (FAIR) (Wilkinson et al., 2016).
Our contribution can help in extracting the precise shape and area of a water body which can further be used to measure the expansion or shrinking of a water body over a period of time.This can be crucial as water body extraction is an important task in different disciplines, such as lake coastal zone management, coastline change, and erosion monitoring, flood prediction, climate and environmental change, and evaluation of water resources (Haghnazar et al., 2024;Hosseinzadeh et al., 2023Hosseinzadeh et al., , 2024;;Jiang et al., 2018).Timely monitoring of surface water and delivering data on the dynamics of surface water are also essential for policy and decision-making processes (Sarp & Ozcelik, 2017).It can also help in detecting the changes in urban water bodies that make a huge difference to human lives and may cause disasters, such as surface subsidence, urban inland inundation, and health problems (Y.Chen et al., 2018).
The rest of the paper is organized as follows: in Section 1 we review the related works; in Section 2 we define the architecture of our proposed GAN-based method and its different variants; in Section 3 we discuss the results and evaluation of our experiments.We introduce a case study on Lake Thathar in Section 4. Finally, Section 5 concludes the paper and shows potential directions for future works.

Background on LSR-HSR Image Mapping
Early efforts to learn surface water mapping between LSR and HSR images include supervised learning methods of remote sensing images (Khandelwal, Karpatne, & Kumar, 2017).Factors such as noise, outliers, and enormous volumes of missing data (due to clouds and sensor failures) can impact the accuracy of the aforementioned classification systems.To overcome the latter limitations, a new approach has emerged called the Ordering Based Information Transfer Across Space and time (ORBIT) (Khandelwal, Karpatne, & Kumar, 2017).The fundamental idea behind ORBIT is to take advantage of the natural ordering of instances that results from the elevation structure and temporal context.Specifically, if an area is filled with water, it is interpreted that due to the gravity all the regions in the basin with lower elevation than the current area are also occupied by water (Pekel et al., 2016).One key assumption in this approach is that a water body always expands and contracts smoothly (with the exception of floods and other natural phenomena) which suggests that the surface area of the water body at close time steps are highly similar (Khandelwal, Karpatne, Marlier, et al., 2017).To put it another way, the water surface extents of dates close together are likely to be highly similar.
Hence, ORBIT can map data from LSR to HSR data only at time steps when noiseless LSR data is available (Khandelwal, Karpatne, & Kumar, 2017).

Background on Planform Change Use Case
For the use case of channel planform change analysis in river bodies, the change of a water body channel is due to natural or human-made fluctuations in the streamflow or sediment supply (Leonard et al., 2020).It has been found that although recurrent images from remote sensing satellites have been widely used to measure the channel change, these measurements are only significant if the measure of the change is more than the uncertainty threshold (Leonard et al., 2020).To address this challenge, a generalized method was introduced by Christina M. et al. for quantifying the uncertainty associated with measurements of channel change from remote sensing images based on spatially varying estimates of uncertainty called the spatially distributed probabilistic (SDP) method (Leonard et al., 2020).The SDP approach leverages image co-registration error, interpretation uncertainty, and digitization uncertainty for quantifying uncertainty.It has been established that SDP can be used to calculate uncertainty at specific locations of linear channel adjustment or polygons of erosion and deposition, while also estimating the central tendency of the net planform change.

Background on Data Pre-Processing
Prior to building the LSR to HSR data mapping, it is necessary to build a data pipeline that prepares the LSR-HSR corresponding data pairs.For this purpose, if the scope of the study is limited to a few water bodies, it is common to use data maps that extract the apriori known polygons (Frazier & Page, 2000;Muster et al., 2013).Image processing techniques have also been used recently for automatically extracting the water bodies' outlines from satellite data without any apriori map knowledge (Rokni et al., 2014).Single-band methods are automated polygon extraction approaches that utilize a selected threshold value to extract water bodies' boundaries.Similarly, multi-band methods combine different reflective bands for improved surface water extraction (Rokni et al., 2014).The weakness of using a pixel threshold is that it is prone to errors caused by the mixing of water pixels with those of different cover types.A more sophisticated approach for automated polygon extraction is to employ image segmentation.The latter technique is relatively more accurate compared with single-band methods (Rokni et al., 2014).

Background on Generative Networks
GAN models have been used in a variety of applications, including image synthesis, semantic image editing, style transfer, and classification (Goodfellow et al., 2014).These networks not only learn the mapping from an input image to an output image but also learn a loss function to train this mapping.This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations (Isola et al., 2016).Regular GANs hypothesize the discriminator as a classifier with the sigmoid cross-entropy loss function (Mao et al., 2016).In this paper, we develop a machine learning-empowered synthetic satellite that is capable of spatiotemporal interpolation across a pair of real satellites.Our approach is built on the GAN model that we used to accurately represent water bodies' shapes and generate realistic synthetic HSR image instances at times when no measurements are available.Our proposed model pertains to the unsupervised learning paradigm that is capable of learning deep representations without extensively labeled training data.The novelty of our approach consists of deriving back-propagation signals through a competitive training process involving a pair of competing networks.Our GAN architecture utilizes a generator model and a discriminator model, where the first model is used to generate new synthetic images, and the second model is used to classify the data as either real or synthetic.The two competing models are trained concurrently in an adversarial process, where the role of the generator is to mislead the discriminator, on the other hand, the discriminator tries to detect the generated images (Goodfellow et al., 2014).

Methodology and Data Sources
An ideal LSR-HSR image mapping method produces images that accurately describe the shape of the water bodies' boundaries.For this purpose, we develop a machine-learning model (i.e., a GAN model) that is specifically equipped to focus on the shape and areal accuracy of the water bodies' interpolated polygons.To better represent the polygon ground truth shapes, we use an optimization algorithm that minimizes the loss with respect to the polygon shapes.Our model is trained on historical image data captured from both Landsat and MODIS satellites that are HSR and LSR respectively (L.Yu et al., 2018).In this section, we will discuss the data preprocessing pipeline and the proposed model.

Water Bodies Data Sources
Our data set is collected from MODIS and Landsat Earth Observation satellites.Figure 2 shows a sample of LSR water bodies captured onboard MODIS and their corresponding HSR captured by the Landsat satellite.MODIS is a key instrument aboard the Terra and Aqua satellites.While Terra's orbit around the Earth is timed so that it passes from north to south across the equator in the morning, Aqua passes south to north over the equator in the afternoon (Justice et al., 2002).Terra MODIS and Aqua MODIS are viewing the entire Earth's surface every 1-2 days, acquiring data in 36 spectral bands, or groups of wavelengths (Barnes et al., 2003).We used Terra MODIS with bands 1-2-1 and 7-2-1 to obtain 256 by 256 pixel size images, as the bands allow the separation between the water bodies and land surfaces (refer Figure 2a).The data collected from the MODIS sensors are merged at a temporal resolution of 8 days and a spatial resolution of 500 m.It is important to note that although the temporal resolution of MODIS is 1 day, the satellite acquires data from the same location on Earth approximately every 8 days.This is known as the 8-day repeating cycle (i.e., MODIS 8-day composite) (Y.Chen et al., 2013).Landsat 8 consists of two data collection instruments called OLI and TRIS.The two sensors collect data at a temporal resolution of 16 days.OLI provides a spatial resolution of 30 m, capturing visible and near-infrared bands, while TIRS has a spatial resolution of 100 m, focusing on thermal infrared bands (Verpoorter et al., 2012).We used bands 7-4-3 and 1-5-7 to obtain the images of water bodies (refer Figure 2b).Likewise, we used 256 by 256 pixel size images for Landsat.The multi-band method takes advantage of reflective differences of each involved band and extracts water based on the analysis of signature differences between water and others (Qiao et al., 2012).Our data set contains 20 reservoirs, across 7 years from 2015 to 2021.We have a total of 6,720 images of MODIS sensors and 3,360 images of Landsat 8 satellite.In our study, we utilized satellite data products from both MODIS and Landsat.For MODIS, we employed Level 1B products, which include radiometrically calibrated and geolocated data.These Level 1B products undergo geometric correction, making them suitable for our mapping purposes.Regarding Landsat, we utilized Level 1T products, which are terrain-corrected and georeferenced.We have outlined more details on the water bodies used in this study in the Appendix section.Following the FAIR guiding principles, we have made our data sets publicly accessible on the project website.

Data Pre-Processing
Prior to the HSR-LSR mapping process, we curated our image data sets to obtain cleaned machine-learning ready data.We applied computer vision methods to extract the shapes of the water bodies' polygons.The image processing step involves extracting useful metadata from the image.In this case, the metadata is the shape of the water bodies' polygons.Our data curation is a four-step process that is shown in Figure 3 and described in Algorithm 1.The steps are (a) convert the input images to Hue Saturation and Value (HSV) format (lines 3-4), (b) binarize the image into a black-and-white format (lines 5-8), (c) denoise the image using morphological operations (lines 9-11), and (d) apply an image mask to extract the water body polygons.
While existing methods on segmentation-based methods for water bodies extraction are powerful techniques for water bodies image analysis tasks, but have a limited capacity to capture fine-grained details and nuances in water bodies, especially in cases where the polygon boundaries are not well-defined (Verma et al., 2021;Xia et al., 2020).Therefore, our proposed data pre-processing pipeline (i.e., binarization and morphological operations) offers a better approach to accurate boundary extraction.Binarizing the image and applying morphological operators are essential steps in image processing for accurate water body detection and shape extraction.On one hand, the binarization process simplifies the image by converting it into a binary format, where pixels are categorized as either foreground (object of interest) or background.On the other hand, morphological operations, such as erosion, dilation, opening, and closing, are valuable for enhancing and refining the edges of objects in the binary image.These operations can fill gaps, connect broken lines, and smoothen object boundaries.
The first step involved converting the colored image in Red Green Blue (RGB) into HSV color space image.This step is crucial as the HSV color space is the most efficient image format for color-based image segmentation (T.-W.Chen et al., 2008).The HSV color conversion also enables the normalization of colors across images captured from satellites that have water bodies of different colors relative to their surroundings.The HSV color space consists of three matrices, hue, saturation, and value whose ranges are (0-179), (0-255), and (0-255) respectively (Loesdau et al., 2014).While the hue represents the color, saturation measures the amount to which the given color (hue) is mixed with white.The value matrix represents the amount to which the given color (hue) is mixed with black.
⊳ Create the polygon boundary polygonMask(:, :, 1) ← 255 newImage polygonMask(:, :, 2) ← 255 newImage extracted ← append(polygonMask) end for return extracted The second step of the data pipeline involves binarizing the converted grayscale input images by converting them to black and white pixels.The latter step reduces the domain of colors contained in the image from 256 shades of gray to a binary set (black or white).To achieve the binarization, we first perform a hyperparameter search for a water body-color pixel threshold value based on the distribution of grayscale pixels in the image.We then establish that the pixel is converted to white (value of zero) if the grayscale value of the pixels is greater than the threshold.Similarly, if the grayscale value of the pixel is lower than the threshold, then it is converted into black (value of one).Both MODIS and Landsat images follow an identical pre-processing, which involves converting their original RGB format into grayscale and subsequently into a binarized image.
The third data processing step consists of removing the outliers and noise patches around the water body by applying morphological operations to the binary images.The morphological operations rely on the relative ordering of the pixel values with respect to their values in order to infer outliers (Comer & Delp, 1999).In this operation, we probe an image with a small shape or template called a structuring element.The structuring element is positioned at all of the possible locations in the image and it is compared with the corresponding pixel neighborhood.We employed two groups of morphological operations.On one hand, the first operation group tests whether the structuring element touches or intersects the neighborhood, which is governed by the θ erosion parameter in Algorithm 1.On the other hand, the second operation group test whether the structuring element fits well within the neighborhood, which is governed by the θ dilation parameter in Algorithm 1.This is known as morphological opening, and it removes small objects (noise) from an image while preserving the shape and size of larger objects in the image.Figure 4 shows an example of a morphological opening that removes the noisy patches from the original image (Said et al., 2016) step consists of applying a color mask to the binary image that extracts the shape of the water body in the form of a polygon.Figure 5 illustrates the entire data pipeline steps using two MODIS LSR images (previously shown in Figure 2a).

Hydrological Generative Adversarial Network (Hydro-GAN)
Our proposed Hydrological Generative Adversarial Network (Hydro-GAN) is an image-to-image translation model that transforms imagery from the LSR domain to the HSR domain by learning the non-linear mapping between the two.The data product will directly enrich the current HSR data sets at times when only LSR data are available.
We used GAN frameworks as the backbone of our proposed model.Our motive for utilizing GAN is their ability to generate crispy sharp images (C.Li et al., 2017).These networks provide a way to learn deep representations without requiring any extensively annotated training data.GAN models are able to learn the representations by deriving back-propagation signals through a competitive process involving a pair of networks (Martinez & Heiner, 2020).
Hydro-GAN is an unsupervised learning model that automatically learns the regularities and patterns in the input data and tries to mimic the same patterns when generating synthetic samples.The success of Hydro-GAN is determined by the output plausibility in comparison with the original data set.Hydro-GAN is composed of two  sub-models Hydro-GEN and Hydro-DIS.The first sub-model is the generator model which is trained to generate new synthetic images.The second sub-model is the discriminator model that tries to classify the generated water body image as either are real (from the HSR domain) or synthetic (i.e., created by the Hydro-GEN sub-model).While Hydro-DIS learns to optimize its loss function, Hydro-GEN learns to fool the Hydro-DIS discriminator model.As such, the two Hydro-GAN sub-models are trained simultaneously in an adversarial process where the generator seeks to better fool the discriminator and the discriminator seeks to better identify the synthetic generated water bodies images.
Hydro-GEN learns a mapping from a random input x in latent space to an output y that matches the data distribution of the HSR Landsat images: G : x → y (Goodfellow et al., 2014).Hydro-GEN utilizes a conditional generative adversarial network (cGAN), where the output y is conditioned on some input z, resulting in a mapping: G : (x, z) → y (Goodfellow et al., 2014).The generator is trained via adversarial loss, and an additional soft Dynamic Time Warping (DTW) loss term, calculated between the synthetic image and the ground truth water body images.This added loss motivates the generator to develop accurate water body shapes that match the shape signatures of the original water bodies.In this context, randomness refers to the incorporation of unpredictable and stochastic input values, drawn from probability distributions, to introduce variability and produce diverse and data-matching outputs.Equation 1 defines the objective function that a traditional GAN network is optimizing for. (1) In Equation 1 D(z, y) refers to the probability that a sample image y pertains to the real data set given the condition z.E is the expected value over all the real image data instances.G(x, z) refers to the synthetic image samples conditioned on z.D(z, G(x, z)) is the discriminator's estimate of the probability that a synthetic instance is real.
The discriminator D attempts to maximize the function, on the other hand, the generator G attempts to do the opposite.The generator can't directly affect the log(D(x)) term in the function, so, for the generator, minimizing the loss is equivalent to minimizing log(1 D(G(z))).

Hydrological Generator (Hydro-GEN) Model
Our proposed Hydro-GEN sub-model is an encoder-decoder model that uses a U-Net design (Ronneberger et al., 2015).The model takes a source water body pre-processed LSR image and generates a target HSR image.The model accomplishes this by first down-sampling the input image to the final layer, then up-sampling this layer's data back to the dimensions of the output image (Ronneberger et al., 2015).The most prominent patterns from the input LSR image are learned to be retained and encoded in the bottleneck layer prior to constructing the output HSR image representation.The U-Net architecture contains skip-connections that connect the corresponding sampling layers.The skip connections are used to pass the low-level data of the HSR image through the bottleneck layer.
The generator model consists of standardized blocks of convolution layers, with batch normalization, dropout, and activation applied to it.Figure 6 shows the skip connections used by the U-Net network (Goodfellow et al., 2014).While both U-Net and Hydro-GEN models share the same foundational structure involving an encoder-decoder design, Hydro-GEN focuses on generating high-resolution water body images from lowresolution inputs, making it particularly suited for remote sensing applications.Hydro-GEN incorporates unique loss functions, including adversarial, L1, and DTW losses, to address the challenges specific to water body image enhancement.
The Hydro-GEN model is trained via the Hydro-DIS model.The weighting of the Hydro-GEN model is updated to reflect the discriminator loss when predicting synthetic images as either real or generated (known as the adversarial loss).Hydro-GEN is penalized when generating synthetic samples that are easily distinguishable by Hydro-DIS from the real training data distribution.The Hydro-GEN weighting is also updated to minimize the L1 between the synthetic images and the original ground truth image, which represents the mean absolute error between these images.The final objective function is made of a weighted sum of the combination of these losses.
The weights range between 1 and 100 in favor of the adversarial loss and vice versa.We used weighting between the two losses to find an optimal balance between adversarial and L1 losses that can maximize the synthetic image plausibility.A detailed summary of our Hydro-GEN model's architecture in Appendix B1.

Hydrological Discriminator (Hydro-DIS) Model
The Hydro-DIS model is a deep convolutional neural network model that performs image classification based on some conditions (Tran et al., 2017).Hydro-DIS takes both the LSR source image and the target HSR image pairs as input and classifies the target image as real of synthetic.The Hydro-DIS model is designed using the Effective Receptive Field, which are the regions that contain input pixels with a non-negligible impact (Luo et al., 2017).
ERFs provide a one-to-many mapping between the pixels in the LSR input image and the pixels in the HSR target image.The chance that a region in the input image is real is determined by each value of the model activation map.The average of these values is used to generate an overall classification score for the input images.The architecture of our Hydro-DIS model is shown in Figure 6.A detailed summary of our Hydro-DIS model's architecture in Appendix B2.

Hydro-GAN Training Challenges
There are several challenges that perturb the training process of the Hydro-GAN model.One of the major issues is that the original minimax loss function can cause the Hydro-GAN to get stuck in a local minimum at the early stages of training when the Hydro-DIS discriminator's job is relatively easier than Hydro-GEN generator (Dwivedi, 2022).Ideally, Hydro-GAN should learn patterns represented in different water bodies' locations and avoid building expertise only in a subset of the training data distribution (Hui, 2018).The second challenge is the gradient vanishing problem that might occur if the Hydro-DIS discriminator performs significantly better than the generator (Dwivedi, 2022).

Hydro-GAN Loss Functions
To overcome the two aforementioned challenges, we propose to explore a new loss function.First, we modify the traditional GAN model defined in Equation 1 to be non-saturating.To achieve non-saturation, we use a variation of the standard loss function where instead of minimizing the log(1 D(G(z))) in Equation 1, the generator maximizes the log of the discriminator probabilities that is, log(D(G(z))).This change is inspired by depicting the problem from a different outlook, where the generator is trained to maximize the likelihood of images being real, instead of minimizing the likelihood of an image being synthetic.This avoids generator saturation through a more stable weight update mechanism (Dwivedi, 2022).
The second loss adjustment that we propose is to use a least-squares loss, where the Hydro-DIS is trained to effectively reduce the sum of the squared difference between generated and original values for real and synthetic images as defined in Equation 2. Similarly, the generator seeks to minimize the sum of the squared difference between predicted and expected values as though the generated images were real as defined in Equation 3. One advantage of using least-squares loss is that it produces a significant adjustment to the model in case of large errors, which helps in preventing the vanishing gradient (Mao et al., 2016).
Along with using non-saturating and least-squares losses, we also propose a new DTW loss term that assesses the generator's accuracy in producing accurate boundary shapes.DTW algorithm is an elastic distance measure that has demonstrated good performance with sequence-based data, and in particular, time-series data (Muller, 2007).
Our motivation for adding a DTW loss is that it emphasizes the polygon shape accuracy which equips the generator to better fool the discriminator.Our new generator loss, defined in Equation 4, is now a weighted sum of the cross-entropy adversarial loss (L adverserial ) , the L1 loss (L L1 ) , and the DTW loss (L DTW ) .

G(x,z)
where L adversarial is the adversarial loss is the probabilistic loss which is also used in the discriminator.L L1 loss is the mean absolute error in the synthetic and the original images.L DTW loss is our proposed loss which minimizes the difference between the Euclidean distance of generated and expected polygons from their respective centroids to the boundary coordinates.
We hypothesize that the traditional adversarial and L1 losses contributions are different than the DTW loss given that they optimize the Hydro-GEN and Hydro-DIS losses and the shape accuracy loss respectively.Therefore we weighted the loss terms using a β parameter than can be used to balance the terms.
We evaluated the synthetic polygon accuracy using the Fréchet Inception Distance (FID), which is a measure for evaluating generative models (Y.Yu et al., 2021).The FID metric is defined as the squared Wasserstein metric between two multidimensional Gaussian distributions of the real and synthetic data.FID uses the Inceptionv3 model to show the similarity between two groups of images by using the computer vision features of the images (Szegedy et al., 2015).The FID is defined in Equation 6.
where N(μ,Σ) is the distribution of the neural network features of the images generated by the GAN model and N μ w ,Σ w ) is the distribution of the same neural network features of the real images used to train the GAN model.tr() represents the trace of a matrix.The trace of a square matrix is the sum of its diagonal elements.μ is the mean vector of the neural network features obtained from the images generated by the GAN model and σ is the covariance matrix of the neural network features obtained from the images generated by the GAN model.A low FID score indicates that the two groups of images are similar, while a higher score suggests that the images have dissimilar characteristics.

Experimental Methodology
An ideal LSR to HSR mapping should produce a water body polygon that is similar in area, shape, and distance with respect to the original HSR polygon.To achieve this purpose, we evaluate the generated polygons against the ground truth polygons by using three criteria: areal, shape, and distance accuracy.Jaccard and Cosine measurements were chosen for assessing the areal precision.For evaluating the shape and distance measures, the
DTW similarity metric was utilized.Each metric will be described in detail in the subsections that follow, along with its utility and drawbacks.

Areal Accuracy Measures
Areal accuracy assesses the correctness of the area-generated water bodies when compared to the area of the HSR ground truth.Prior to generating the area of the water bodies' polygons, we first used the contour detection tool in OpenCV that detects the boundary of the polygons.The polygon contours are considered as the curve joining all the continuous points (along the boundary), having the same color or intensity.The process of creating the polygon contour/boundary based on an input image is shown in the Figure 7b.The area similarity is computed using the Jaccard and Cosine similarity indices based on the extracted polygon boundaries.

Jaccard Similarity Index
The Jaccard similarity index is a metric that compares the area of two polygons in Euclidean space by quantifying the ratio of the shared area (intersection) between the two polygons with respect to their combined areas (Union): This similarity metric follows the property of scale invariance, as it evaluates the size of the common area in relation to the combined area of the two polygons (Boubrahimi, Aydin, Kempton, & Angryk, 2016;Boubrahimi et al., 2018).

Cosine Similarity Index
The Cosine similarity index is a metric that compares the area of two polygons in Euclidean space by quantifying the ratio of the shared area (intersection) between the two polygons with respect to the square root of the product of the two areas (Boubrahimi, Aydin, Kempton, Mahajan, & Angryk, 2016;Boubrahimi et al., 2018).
The areal similarity metrics range between zero and one hundred and the higher the metric values the better the areal match of the real (P real ) and generated (P′ predicted ) polygons.
The mathematical form of DTW algorithm as accepted in the literature is shown in Equation 9. N and M are the lengths of the input time series x and y.Initially the D matrix is initialized to D 0,0 = 0 and D i,j = ∞.The square of the differences between y j and x i is generally used as the cost function g in Equation 9.
In addition to using DTW as a loss term, we used DTW as a dissimilarity measure by calculating the optimal matching between the two shape signatures.We used DTW to match the points of the polygon by aligning their centroid shape signatures.The DTW algorithm generates a warping distance between the two input time series, which indicates the discrepancy between the two shapes.The greater the warping distance, the greater the difference between the two polygon shape signatures.
Table 1 shows the summary of the areal and shape evaluation metrics that were used in this paper, along with their advantages and drawbacks.

Data Set Preparation
Prior to training Hydro-GAN, we created image data pairs that map each curated MODIS image and its corresponding curated Landsat image that was captured during the same day.The data set was split into a training set and a testing set.We reserved 90% of the image data pairs from each reservoir for the training, and the remaining 10% image data pairs were used for testing.Namely, 672 out of 6,720 MODIS satellite images and 336 out of 3,360 Landsat-8 images were included in the test set.

Hydro-GAN Hyper-Parameter Search
To identify the optimal weights that produce the highest polygon accuracy based on FID, we performed a grid search of the weight parameters, defined in Equation 5. We trained our proposed hydro-GAN model on different weight ratios.Figure 8 shows a heatmap that summarizes the FID scores obtained by each model when trained on different β weights, ranging from 1 to 100 for balancing the L adversarial , L L1 and L DTW losses.
The result in Figure 8 reveals that when more weight is assigned to the DTW loss, the Hydro-GAN model generates images that are more similar to the target images.This is reflected by the low FID scores that are approximating a value of zero.Specifically, the lowest FID scores (of 0.2) were obtained in the top right corner, where β ∈ [90 100], which indicates that the generated images are significantly similar to the ground truth.Following the same line of thought, when the β weight is small, the FID scores are high which signifies that the Hydro-GAN loss is overpowered by the adversarial and L1 losses (refer to the bottom left corner of Figure 8).Selecting an equal weight (β = 50) for the adversarial and distance losses does not achieve optimality.From the aforementioned observations, we trained our Hydro-GAN model using the β value of 100.

Hydro-GAN Training Result
After identifying the optimal weight ratios from Figure 8, we evaluate our proposed Hydro-GAN model based on the generated HSR images from the LSR MODIS image inputs.We compare the HSR-generated output with the original HSR Landsat ground truth images.Figure 9 shows the evolution of the input LSR image after being fed to the Hydro-GAN model in comparison with the ground truth HSR image captured by Landsat. Figure 9a shows an input LSR MODIS image fed to the Hydro-GAN model and a temporary results that were learned after 30 epochs (Figure 9b) and after 100 epochs (Figure 9c).The results indicate that the model learning curve is improving when comparing the shape and area of the generated polygons after 30 and 100 epochs.After 100 iterations, the generated water body polygon morphology is similar to the polygon contained in the ground truth HSR Landsat image as shown in Figure 9d.After training our Hydro-GAN model for 100 epochs, we measured the accuracy of the model by conducting a quantitative analysis of the test set images based on three evaluation criteria: areal, shape, and distance accuracy as discussed in Section 2.4.

Areal Accuracy Result
To measure the areal accuracy between the generated polygons (from the Hydro-GAN model using the test data set) and the ground truth HSR Landsat polygons, we used the Jaccard and Cosine similarity percentages as illustrated in Figure 10.The figure shows the histogram plot of Jaccard and Cosine similarity metrics when assessed on the entire testing data of LSR-HSR image pairs.The results of the Jaccard distribution indicate a good agreement between the generated and the original Landsat polygons with a high similarity percentage, ranging approximately between 86% and 95%.A similar result was also obtained from the Cosine histogram distribution where the values span approximately from 87% to 97%.This indicates that there is a strong correlation between Jaccard and Cosine metrics and implies that the Hydro-GAN model is performing well, resulting in highly accurate synthetic HSR polygons.Although the two metrics are different, as outlined in Table 1, one explanation for this correlation is that the numerators of both the areal metrics are the same (i.e., area of intersection of the two polygons), as demonstrated in Equations 7 and 8.

Shape and Distance Accuracy Result
To measure the shape accuracy between the actual and the generated polygons from the test set, the polygons were converted into their respective shape signatures by calculating the Euclidean distance from the centroid to the coordinates of the points on the edges of the polygon and converting these distances into a time series.The result of which was used to plot a similarity graph as shown in Figure 11a.As shown in the Figure 11a, the result indicates that when there is a dip in the shape signature of the ground truth polygon around 300 and 700 indices, there is a similar dip in the shape signature of the generated polygon as well.The same observation can also be made about the rise in the shape signature of the polygons being compared around indices 0, 400, and 1,000.It can also be seen that the shape signatures of the generated and the ground truth polygon are nearly overlapping.This indicates that our Hydro-GAN model is producing HSR images that are highly accurate in shape signature with respect to the reference Landsat images.
To measure the distance accuracy, we used DTW to measure the extent of alignment between the actual and the generated polygon shape signatures.This process was repeated for all the images in the test set.To illustrate the result we chose a random LSR image from the test set to generate an HSR Landsat image.The generated image was then compared with the ground truth HSR image and the DTW was applied to their respective polygons.The result can be seen in Figure 11b which shows the DTW alignment and warping matrix that were produced when comparing the two polygons.The graph in Figure 11b shows that the warping path (denoted by the blue line from the lower-left to the top-right corner) between the generated and actual polygons is approximating a diagonal line (which is the ideal case).This implies that the HSR images produced from our Hydro-GAN model are highly accurate in the distance metric as well.
We evaluate the distance metric of the entire test set by calculating the normalized alignment cost of the DTW matrix between all the generated and actual polygons.A normalized alignment cost of 1 is considered the ideal case.The result of this evaluation is reported in Figure 12, which shows that the distribution of the normalized alignment cost for the test set images ranges between 1.13 and 1.51, and the average alignment cost is 1.31 which are all close to the ideal value of 1.This indicates that our Hydro-GAN model is producing HSR images that have lower DTW distances when compared with the actual ground truth.
To better understand how the three areal and distance measures are correlated with each other, we present their correlation matrix which is shown in Figure 13.We note that the Jaccard and Cosine metrics have a strong positive correlation (0.96) which reveals that when there is an increase in the values of Jaccard similarity, the Cosine similarity also increases.This is due to the common numerator shared between Jaccard and Cosine.It can also be inferred from Figure 13 that the DTW alignment cost generates a moderate strength negative correlation with the Jaccard and Cosine similarities (around 0.50).This result indicates that in the cases when the Hydro-GAN model produces highly accurate areas, the exact shapes of the polygons are not highly accurate.Similarly, when Hydro-GAN produces highly underestimated (or overestimated) areas, the water body shape is relatively more accurate.The finding of this experiment is important for fine-tuning the model depending on the hydrology research needs.For example, if the purpose of generating HSR images is to study the sedimentation phenomena that require a precise polygon boundary, then Hydro-GAN can be tuned for optimizing L L1 and L DTW losses.In case the research needs to study the water volume change, then introducing a new areal loss L area could be useful.
In recent years, various state-of-the-art image-to-image translation models have been developed to address the challenges of transforming images from one domain to another while preserving their essential features (X.Li et al., 2021).Here, we briefly introduce three prominent models.SPA-GAN, or Spatial Attention GAN, is a model Query Index refers to the index of the shape signature in the generated polygon, while Reference Index refers to the index of the shape signature in the ground truth (reference) polygon.Cycle-GAN, on the other hand, is known for its ability to perform unpaired image translation.It introduces a cycle consistency loss to ensure that the translation from one domain to another and back results in the original input, enforcing a cycle-consistent mapping between domains without the need for paired data during training (Zhu et al., 2017).Pix2Pix is a conditional GAN that excels in paired image-to-image translation tasks (Isola et al., 2017).It learns a mapping from input to output images by training on aligned pairs, making it particularly effective for tasks where there is a clear correspondence between the input and output images.
In our experimental comparison, we evaluated these models alongside your proposed Hydro-GAN using the Jaccard Index.The Jaccard Index measures the similarity between two sets by dividing the size of their intersection by the size of their union.As shown in Figure 14, the results indicate that Hydro-GAN outperformed the other models with a Jaccard Index of 0.98, showcasing its effectiveness in image-to-image translation.This comparison provides a comprehensive assessment of Hydro-GAN's performance relative to existing state-of-theart models, contributing to a thorough validation of the proposed approach.

Case Study: Lake Tharthar's
In this case study, we evaluated our proposed Hydro-GAN model on Lake Tharthar's shrinking and expansion behaviors.Lake Tharthar's is a focal point for hydrological research and water resource management in Iraq (Kornijów et al., 2001).Covering an area of approximately 2,100 square kilometers (810 square miles) at its maximum extent, it is one of the largest lakes in the country.Its seasonal dynamics, ecological significance, and role in sustaining water supply make it a critical area of study for hydrologists and environmental scientists aiming to address water-related challenges in the region.
Our study area consists of Lake Tharthar's polygon area across a period of 7 years, from 2015 to 2021.We compared the area of the lake water body from three different image sources.First, the original area of the water body was calculated from the original HSR Landsat images.Second, the area of the water body was extracted from the generated Landsat images (using our Hydro-GAN model).Third, the area of the water body was calculated using the LSR MODIS satellite images.The three polygon areas were compared with each other as plotted in Figure 15.The blue bar plot in Figure 15 indicates that Lake Tharthar's (one of the reservoirs from the data set) first shrinks in surface area from 2015 to 2020 with a little increase in the year 2018, then shrinks again  until 2020.The area of the water body again increases in the year 2021.The shrinking of the water body can be due to numerous hydrological as well as human factors, such as sedimentation as well as an increase in freshwater consumption (Mahajan & Martinez, 2021).
The increase in the area of the water body in 2021 can be seen in Figure 15g where a small portion of the surface boundary has increased on the top left corner and bottom compared to 2020.It can also be noticed that the images produced by our Hydro-GAN model also follow the same pattern of change in the real ground-truth area of a water body across the 7 years (shown in the green bar plot).This result indicates that our Hydro-GAN model is predicting the shrinking or expansion of the surface area of a water body accurately.Another important observation that can be noted from Figure 15 is that the area calculated from the LSR MODIS images (i.e., our competing baseline), shown in the red bar plot, fails to follow the same pattern of change in the water body accurately.We note that the LSR MODIS extracted area sometimes overestimates the actual area as reported in the years 2017, 2018, 2019, and 2020.Similarly, the LSR MODIS extracted area shows an under-estimation for the years 2015, 2016, and 2021.Our converted LSR to HSR extracted areas, shown in the blue bar plot of Figure 15, are more realistic area estimations that produce significantly lower errors than LSR MODIS.This result shows that our proposed LSR to HSR conversion using Hydro-GAN improved the prediction of the area of the Lake Tharthar's water body.In addition, we note that our model produces generally a slight under-estimation of the area.Following our suggestion in Section 3.2.2, using an areal loss can correct the under-estimation.

Conclusion and Future Work
In this paper, we propose Hydro-GAN a deep learning-based generative method that uses a cGAN for mapping the remote sensing information available at low resolution (MODIS satellite) to high resolution (Landsat satellite).In particular, we have used the case study of water bodies and reservoirs.Our proposed method uses a weighted DTW loss function along with the adversarial and L1 loss, to generate the HSR-LSR mapping.The results were evaluated using areal, shape, and distance similarity measures.The evaluation shows that our weighted Hydro-GAN model improved the accuracy of the generated water bodies polygons compared to stateof-the-art GAN models.Since the availability of accurate data on water bodies' boundaries at high spatial and temporal resolution is important for assessing the role it plays in multiple hydrological research tasks, our work can provide complementary data sets for hydrological studies.Hydro-GAN can generate high-resolution data at historical time steps when such data is unavailable which can be used in areas where a large amount of historical data is required for forecasting purposes.As future work, we aim to further improve our mapping by making enhancements to our computer vision algorithm (which extracts the polygon boundary), so that it can extract the polygon even when the images obtained from the remote sensing satellites contain noise like clouds or other distortions.We also aim to extend the domain of our method beyond water bodies, like forests and vegetation.Finally, we plan to make our model generic, so that it can be used to map the images from any remote sensing satellite to another, which can include Sentinel and Hyperion satellites as well.

Figure 1 .
Figure 1.Spatial and temporal resolutions of earth's different satellites.

Figure 4 .
Figure 4. Structuring element example applied on a noisy image.

Figure 5 .
Figure 5. Pre-processing pipeline of (a) Kariba reservoir (Zambia) and Lake Argyle (Australia) MODIS images that are transformed into (b) grayscale images, (c) binarized images, before the (d) water bodies polygons extraction.

Figure 6 .
Figure 6.Architecture of Hydro-GEN and Hydro-DIS sub-models of the Hydro-GAN applied on Lake Mead (part of the Colorado River).

Figure 7 .
Figure 7.An input (a) water body image (Qapshaghay Bogeni Reservoir) used to extract the (b) water body polygon and (c) the shape signature based on the centroid distances.

Figure 8 .
Figure 8. Fréchet Inception Distance (FID) metric of Hydro-GAN generated polygons with β values ranging from 0 to 100.The lower the FID value, the better (i.e., the generated and target polygons are similar).

Figure 9 .
Figure 9. (a) Example input MODIS image from Qapshaghay Bogeni Reservoir (Kazakhstan) fed to Hydro-GAN and transformed into (b) an high spatial resolution after 30 training epochs (2) and (c) 100 training epochs compared to the (d) ground truth Landsat image.

Figure 10 .
Figure 10.Distribution of area similarity measured on the test set images by plotting histogram of Jaccard and Cosine similarity percentage between real and generated polygons in the test set images.

Figure 11 .
Figure11.Dynamic time warping measure applied by (a) aligning two polygons shape signatures (Ground truth in blue and generated in black) and (b) computing their warping distance by finding the optimal path along the warping matrix diagonal.Query Index refers to the index of the shape signature in the generated polygon, while Reference Index refers to the index of the shape signature in the ground truth (reference) polygon.

Figure 12 .
Figure 12.The normalized alignment cost for dynamic time warping matrix in test set images.

Figure 13 .
Figure 13.Correlation matrix of Jaccard, Cosine and dynamic time warping similarity measures used for area, shape and distance accuracy evaluation, when measured between generated and original high spatial resolution test set images.

Figure 14 .
Figure 14.Performance of the state-of-the-art models.

Figure 15 .
Figure 15.(a-g) Snapshot of Lake Tharthar's water body for each year from 2015 to 2021, providing a visual record of changes over time.The bar plots compare area variation of Lake Tharthar's between generated and actual polygons across 7 years from 2015 to 2021.
Algorithm 1. Data Pre-Processing Algorithm to Extract Polygon Out of a Satellite Image Input: Dataset D containing satellite images, θ dilation dilation structuring element, and θ erosion the erosion structuring element.
Output: Set of extracted polygon images extracted.extracted ← [ ] ; . The resulting binary image contains a non-zero value only if the structuring element morphological tests are successful at a location in the input image.Finally, the last FILALI BOUBRAHIMI ET AL.

Table 1
Summary of Areal and Shape Evaluation Metrics FILALI BOUBRAHIMI ET AL.

Detailed Summary of Hydro-DIS Water Resources Research
FILALI BOUBRAHIMI ET AL.B2.FILALI BOUBRAHIMI ET AL.