GMA: An Improved Framework of Radar Extrapolation Based on Spatiotemporal Sequence Neural Network

Most previous current spatiotemporal sequence neural networks used for radar extrapolation have difficulties in learning long‐term spatiotemporal memories. This is because the spatiotemporal sequential neural networks only use the information from the previous time step node to update the prediction state, and the networks tend to rely on the convolution layers to capture the spatiotemporal features, which are local and inefficient. In order to capture the long‐term temporal characteristics and local abrupt spatial characteristics of radar echo sequence, we propose a new framework, Global Memory Attention (GMA), which has two contributions. The first is that we establish a global information flow between calculation units, extract the key historical memory from the global information flow, calculate the correlation between the historical key memory and the current time frame prediction state, and determine how much historical memory participates in the prediction state update. It alleviates the current network's difficulty in learning the long‐term spatial and temporal characteristics of radar echo sequences and the problem of short‐term dependence, and reduces the interference of image noise in the process of radar extrapolation prediction. The second is that the GMA module is a flexible additional module that can be applied to most radar extrapolation algorithms based on timing networks. GMA has reached the highest level on radar extrapolation tasks, and we have provided ablation studies to verify the effectiveness of GMA.

2 of 13 divides the echo into different tracking regions, calculates the optimal spatial optimization correlation coefficient of adjacent times, and then determines the fitting relationship between the areas and realizes the prediction. However, it is difficult to ensure the accuracy of extrapolation tracking for weather processes with relatively rapid echo changes and strong convection (Liang et al., 2010). Optical flow method was proposed by Gibson in 1950, it obtains the motion vector field by calculating the optical flow field of the continuous echo, and then realizes the extrapolation based on the motion vector. However, the convective weather system is a dynamic and complex system. It has the characteristics of nonlinear motion (such as rotation), moderate deformation (e.g., expansion and contraction), and rapid generation and extinction. The traditional radar echo extrapolation method only assumes the simple linear evolution of the echo, and does not make full use of the long-term historical radar echo data in the time domain, and cannot effectively predict the rotation change of the echo and the process of generation and death. It has low accuracy. Shortcomings such as short forecast time.
In recent years, with the great progress of deep learning technology in computer vision and natural language processing, researchers began to apply it to radar extrapolation problem processing. Agrawal et al. (2019) tried to deal with radar echo based on two-dimensional convolutional neural network. However, due to the limitation of its receptive field, two-dimensional convolution calculation can only extract the spatial features of a single image, and lacks the modeling of the temporal relationship of image sequences. Kim et al. (2021) started to try to apply the three-dimensional convolutional neural network to the extrapolation task. LSTM (Hochreiter & Schmidhuber, 1997) proposed the RNN structure and uses gated update memory states to complete predictions. Ranzato et al. (2014) used clustering algorithm to interpolate image sequences and predict future image sequences in the cyclic neural network model for the first time. However, convolutional neural network and cyclic neural network lack reasonable spatial modeling means, and it is difficult to capture the spatial relationship in radar echo extrapolation. Therefore, it is difficult to construct the feature space of spatiotemporal sequence. To solve this problem, Shi et al. (2017) proposed ConvLSTM, it used convolution calculation to replace the Hadamard product of the original cyclic neural network calculation unit, and stacked multiple calculation units to build a convolution long-term and short-term memory network (ConvLSTM) with encoder predictor structure. It integrates the convolution layer that effectively extracts spatial features into the cyclic neural network structure that simulates the state of time evolution. The network model can model time and space at the same time. Therefore, the spatiotemporal sequence network, which can model spatiotemporal characteristics at the same time, has become the research focus of researchers in the direction of radar extrapolation. At present, advanced radar extrapolation neural networks such as Wang et al. (2017Wang et al. ( , 2018Wang et al. ( , 2019 are based on this spatiotemporal network structure. Compared with traditional meteorological methods, the radar echo extrapolation method based on deep learning has advantages in the existing performance and development space. Among them, the radar echo extrapolation method based on spatiotemporal sequence network has become the main method of deep learning methods with mature, stable, and effective modeling methods. The radar extrapolation algorithm based on spatiotemporal sequence network is limited by the network structure. They all have inherent limitations, there are short-term dependency problems and long-term learning dilemmas. The short-term dependency problem is because the prediction process of the spatiotemporal sequence network only accepts the information of the previous time frame, it is difficult to effectively utilize the long-term information in the input sequence. Therefore, the network is susceptible to the interference of anomalous noise in a certain frame (Kelly et al., 1960), and the interference information expands with the extrapolation time. During long-term radar echo extrapolation, the false prediction results gradually accumulate with the extension of the extrapolation time, and the prediction results gradually deviate from the actual situation. Also, there is a problem of gradient disappearance in the process of back-propagation of the spatial and temporal information in the network recursive layer. The gradient information transfer gradually weakens during long-distance time steps, which makes the network not ideally able to use the complete sequence information. When conducting long-term radar extrapolation, it is often difficult to obtain long-term information. For example, when predicting simple linear motion problems, accurate predictions can be done only by relying on short-term information. When predicting the ideal pendulum movement, the future motion trajectory and motion state can be predicted according to its periodicity. However, in reality, radar extrapolation cannot rely solely on short-term information or long-term memory due to the movement, generation or dissipation of radar echoes caused by the interaction of complex atmospheric physical processes.
The above is the current dilemma of radar extrapolation. To solve this problem, we observed the process of human recalling past memories. Then we found that in the process of human recall, global information guides 3 of 13 the whole process of human recall. When people recall an event in the past, especially a long time ago, people first vaguely describe the approximate content, and then recall the details of the event along time. But sometimes, people usually lose information about historical events because of memory confusion at a certain point in time. However, if you vaguely remember the approximate content, even if some small faulty memory interferes, it will not affect the continued recall. Inspired by this, we integrate a method guided by global information into a spatiotemporal sequence network-based radar extrapolation algorithm to propose a new Global Memory Attention (GMA) framework. It contains three parts: base calculation units, global information flow between base units, GMA calculation module.
GMA has the following characteristics: It establishes a global flow of information between calculation units. Because each radar echo image contains more or less historical information related to the current prediction state, the global information flow contains the main content of the historical memory state.
It extracts the key historical memory from the global information flow through a series of attention mechanisms, calculates the correlation between the historical memory and the current prediction state, and determines how much historical memory participates in the prediction state update of the current time frame. It improves the utilization rate of strong correlation historical memory and reduces the interference of weak correlation historical memory.
It can be flexibly applied to any radar extrapolation algorithm based on timing network. Because a separate module is used between the recurrent units to extract global information, it does not interfere with the internal calculations of the recurrent unit.
We have proved that this unit can help the network learn long-term spatiotemporal memory and reduce image noise interference. It improves the long-term learning dilemma and short-term dependence dilemma of radar extrapolation, and proves the effectiveness of this module in practice.

Neural Network
Neural networks are widely used in radar extrapolation tasks. Agrawal et al. (2019) applies convolution neural network to radar echo extrapolation, which is limited by two-dimensional convolution receptive field and lacks the establishment of feature space for the time relationship of radar echo. Srivastava et al. (2015) attempted to predict the radar echo sequence with the cyclic neural network. Because the conventional cyclic neural network adopts linear calculation, it is difficult to extract the spatial characteristics of the radar echo better. Shi et al. (2017) replaced Hadamard product with convolution operation, extracted the spatial characteristics of radar echo in convolution layer, and simulated the time evolution of radar echo in recursion layer. ConvLSTM is an important work in the development of radar extrapolation method. Since then, many RNN networks based on ConvLSTM have emerged. For example, Shi et al. (2017) simulated the movement of information area by introducing learnable convolution, and added up sampling and down sampling layers to the prediction structure to alleviate the problem of excessive computation of spatiotemporal sequence network. Lotter et al. (2016) proposed that the extrapolate algorithm PredNet, which extracts the temporal and spatial features of radar echoes for implicit modeling and characterization, and predicts future image sequences through feature learning. In the spatiotemporal sequence network structure, historical memory only propagates in the time step of the recursive layer, and spatial information only propagates in the convolution layer. Therefore, Wang et al. (2017) introduced new memory states and related gating to generate spatiotemporal long-term and short-term memory units (ST-LSTM). This memory state forms a Z-shaped memory flow in the network. It can effectively transfer the spatiotemporal semantic information of the high-level network to the low-level network at the next moment, so as to realize the exchange of spatiotemporal semantic information at different levels. Wang et al. (2018) proposed the concept of gradient highway and CausalLSTM. It makes the jump propagation of spatiotemporal information alleviate the problem of gradient disappearance in back-propagation. At the same time, the depth of convolution layer is increased, and the network extracts multilevel spatial information of radar echo through convolution layer. Wang et al. (2015) integrates three-dimensional convolution into ST-LSTM unit, adds convolution calculation to the time dimension of picture sequence, and strengthens the utilization of upstream semantic information by 4 of 13 downstream tasks. Wu et al. (2021) decomposes the motion of the physical world into instantaneous changes and cumulative motion trends, and designs a Motion-Unit that can unify homeopathic changes and motion trends. As mentioned above, the researcher network improves the radar extrapolation method by simulating the physical motion trajectory of radar echo, effectively establishing the feature space of spatiotemporal information and the propagation route of spatiotemporal memory. However, in essence, the network still only uses the information of the previous time frame node to update the current frame prediction state, and the long-term spatiotemporal memory gradually disappears in the recursive layer propagation process, and falls into the dilemma of being disturbed by the image noise of a certain frame in the whole downstream prediction task. Wu et al. (2021) used continuous historical memory information to participate in prediction status update. Although the current frame can obtain the complete historical state in this way, the participation of all historical information in updating will bring a lot of interference, because not all historical memories are related to the current frame. Inspired by this, we integrate all historical memory states to establish a global information flow. In contrast, we use the attention mechanism to extract historical key memory from the global information flow, reducing the interference of irrelevant information on the current frame.

Vision Transformer
Since Transformer (Vaswani et al., 2017) was effectively applied to the field of natural language processing, researchers try to transfer transformer to the field of radar extrapolation. For example, Lin et al. (2020) strengthened the global spatial representation in ConvLSTM by introducing spatial self-attention mechanism. Luo (2021) proposed the interactive double attention mechanism in order to make full use of the short-term context of radar echo. Zhong (2020) use attention to model long-term spatiotemporal features to obtain enhanced spatiotemporal representation of radar echoes. Inspired by this, in our work, we use the attention mechanism to calculate the correlation between the key historical memory extracted from the global information flow and the current prediction process, and determine how much historical memory updates the prediction state. It improves the utilization rate of strong correlation historical memory and reduces the interference of weak correlation historical memory.

Method
In this section, we will introduce the overall structure of GMA and the specific structures of GMA module in detail. First, it uses all historical information and is not disturbed by long-term redundant information. Second, extract long-term key memory without losing short-term memory. Finally, get the memory state strongly associated with the current prediction state.

GMA Framework
It is a conventional stacked structure (Figure 1a). Block is a basic calculation unit (such as ConvLSTM, ST-LSTM). Take ConvLSTM as an example. The update equations of ConvLSTM for the th time step of the th layer are as follows: is the sigmoid function, denotes the convolution kernel, * is the convolution operation, is the Hadamard product. −1 , −1 ,  −1 is input state, hidden state, and memory state, respectively, and , , , are input gate, input modulation gate, forget gate, and output gate, respectively, the subscript correspond to the th time step.

of 13
For the spatiotemporal sequence network of the conventional stacked structure, several layers of computing units are stacked in each frame time step, and several frame time steps are established to simulate the time evolution. The radar image sequence is input to the first layer, and the multilevel spatial features of the radar echo are extracted through the convolution layer of different calculation units with the same time step. Finally, the top-level radar echo space-time memory is mapped back to the original pixel space to predict the next radar image, and the future radar image sequence is generated at the top level. The prediction process of each time step is calculated by convolution layer, and the spatial information is extracted and transferred upward across layers. Then simulate the forward transmission of time evolution and model the time information. This can realize the flow of space-time information. In this case, the spatiotemporal information flow is updated in the calculation unit with the time step. The calculation unit of the current time frame mainly depends on the spatial characteristics of the calculation unit of the previous time step and time memory. This makes each frame of radar extrapolation more inclined to learn short-term historical memory, and the long-term memory decays with the increase of time step.
We try to establish a global information flow to learn long-term spatiotemporal memory (Figure 1b). The orange dotted line represents the global memory flow. The orange arrow indicates the flow of each state in the network. represents the global memory extracted from at th time step of the th layer. The key equations of the recurrent units in GMA for the th time step of the th layer are as follows: Unlike the conventional stacked structure that only uses the hidden state of the previous time step, we use the memory state of all historical time steps to update the prediction state of the current frame at each time step. First, we extract the key historical memory from the global information flow, filter the unimportant historical memory, and reduce the interference to the prediction process. Then we calculate its correlation with the current time step prediction state, so that the strong correlation spatiotemporal memory update calculation unit. The global memory flow established by GMA supports the simultaneous flow of long-term spatiotemporal memory and short-term spatiotemporal memory. The propagation and update of memory state is no longer limited by step size, and the most relevant historical memory can be used in each frame prediction process.

GMA Module
We consider that the prediction process of the current time step benefits from the temporal and spatial memory of historical frames, we build a global information flow from which to obtain a list of historical memory states.
But not every frame of historical memory contributes to the prediction of the current time step. Therefore, we use the GMA module to calculate the correlation between the list of historical memory states and the predicted state of the current frame, and filter the weak correlation information, so that the historical memory with strong correlation guides the update calculation unit prediction process.
The GMA module is mainly composed of two modules ( Figure 2). First, Temporal-SE extracts key historical memory from the list of historical memory states. Since each radar picture contains more or less the radar echo trajectory and echo intensity change trend, the extracted information includes the main content of all historical memory states. Then, through the attention module, the degree of relevance to the predicted state at the current time step is calculated, it makes the strong associative memory state more involved in the updating of calculation units. The overall equations of the GMA module for the th time step of the th layer are as follows: The Temporal − SE module consists of three layers (Figure 3). We consider that the prediction process of each frame does not need to use all the historical spatial information, and there is a lot of repetition and redundancy in the spatial memory between historical frames, and we pay more attention to the changes in the spatial position of radar echoes over time. The spatiotemporal sequence network extracts the spatial information of each dimension of the radar echo through the convolution layer and stores it in the memory state, we establish ℎ − SELayer to extract the key spatial historical memory of the radar echo. First, each frame of historical memory state input ℎ − SELayer to learn the importance of different channel features. It performs a squeeze operation on the memory state to learn the global characteristics of the channel level. Then perform an excitation operation on the global feature to get the weight of each channel multiplied by the historical memory to extract key spatial history memory to radar echoes. In the same way that the prediction process does not benefit from each historical frame, we use − SELayer to extract the key time historical memory of the radar echo sequence. Similarly, it performs squeeze and excitation on the state of historical memory. Capture the global characteristics in the time dimension and capture the historical memory state with important information. This makes the model filter irrelevant information and obtain the main content from all historical memory states. Finally, we use 3D-convolution to integrate all global historical memory state lists to obtain key historical memories. −1 represents key historical memory from at − 1 th time step of the th layer.
Taking into account the diversity and uniqueness of the prediction process at each time step, the prediction process at each frame time step is not necessarily applicable to the key historical memory. We use the attention 7 of 13 mechanism to calculate the correlation between key historical memory and the predicted state at the current time step, the strongly correlated key historical memory guides the current frame time step prediction state update, avoiding the interference of the weakly correlated key historical memory on the prediction process ( Figure 4). This helps the model to overcome the step size limit and learn the most relevant historical information.

of 13
At the th time step of the th layer, we map the current prediction state to query vector , − SE module extracted key historical memory is mapped into key vector and value vector , flattening and linear full connection processing by the Map method in Figure 4. Calculate the correlation of and through the dot product, and normalized by softmax to get the correlation score ℎ . It is regarded as the degree of correlation between the predicted state of the current time step and the key historical memory, and determines the degree of participation in the update of the predicted state of key historical memory. Multiply the correlation score by to get the key historical memory state that best matches the current prediction process.

Experiment
In order to test the extrapolation accuracy of this method, this paper uses Hunan Meteorological Bureau (2018-2020) radar echo image data set to verify. This data set is the radar echo reflectivity of Changsha (105.00°−117.09°E, 21.99°-33.01°N) collected by Doppler weather radar of Hunan meteorological station. Collect a 256 × 256 × 1 radar image every 6 min. Forty-one continuous radar echo images are taken as a set of data sequences.
The first 21 frames are input images and the last 20 frames are extrapolated images. The data set has the same geographical location, image size, time cycle spanning 2 years, and has both day and night radar images. It ensures the diversity, particularity, and consistency of data. In many complex situations, semantic information guidance based on long-term memory is very useful for radar extrapolation.

Data Set and Setup
In order to prevent data leakage and generalization of the network model for special data samples, we unevenly segment the data set to ensure that the training set and the test set do not repeat, and the test set contains special sample sequences. We set to 16,378 training sequences and 2,447 test sequences. Among them, about 280 special test sequences are radar echo data of different time or different areas in the training set data, which are used to test the generalization ability of the model.

Main Result
We compare GMA with five benchmark networks: ConvLSTM, ConvGRU, PredRNN, PredRNN++, Motion-RNN. Especially MotionRNN is the latest CVPR network dedicated to radar echo extrapolation missions in 2021. We use the general image quality evaluation index of deep learning: mean squared error (MSE) to estimate radar image absolute pixel error, peak signal-to-noise ratio (PSNR) to evaluate the degree of image distortion, and radar echo image sharpness (Sharp). As shown in Table 1, after using the GMA framework, the prediction performance of all baseline networks is improved. Among them, PredRNN++ obviously outperforms other models after using the GMA framework, and achieves the best results.
At the same time, we use several meteorological precipitations near prediction indicators to evaluate critical success index (CSI), false alarm rate (FAR), and probability of detection (POD). The above calculation indicators comprehensively consider hits, false alarms, and misses events. They are defined as   Tables 1 and 2. After using the GMA framework, benchmark networks achieved the better results in the CSI, POD, FAR of two thresholds, which shows that benchmark networks after using the GMA framework can predict precipitation with different precipitation intensities.
As shown in Figure 5, we show five benchmark networks and their radar echo images after using GMA framework. In the first few frames of radar extrapolation, the radar echo predicted by the reference network is basically consistent with the real results. With the increase of extrapolation time, the prediction accuracy gradually decreases. ConvLSTM and ConvGRU can only predict the future echo trajectory, cannot predict the echo shape, gradually lose the radar echo information, and it is difficult to predict the future echo shape until it is completely blurred. PredRNN can still predict echo shape, but it loses low-intensity echo information and cannot predict low-intensity radar echo dissipation. After using GMA framework, they can better capture the approximate range of low-intensity radar echo in the future. In contrast, PredRNN++, MotionRNN predicts the large-scale echo shape details in more detail after using the GMA framework, while accurately predicting the dissipation of small-scale radar echoes.
As shown in Figure 5, ConvLSTM, ConvGRU predicted radar echo map gradually blurred with the extrapolation time. Although the radar echo trajectory predicted by PredRNN and MotionRNN is basically consistent with the actual situation, as the extrapolation time increases, it becomes difficult to predict the small-scale echo trajectory. For low-intensity radar echoes, it can only predict the approximate range of future echoes, and the details and clarity of the echo shape are not enough. PredRNN++ can predict the trajectory of low-intensity radar echoes, but cannot predict the dissipation of small-range radar echoes. In contrast, PredRNN++, MotionRNN after using GMA due to the guiding role of global memory in the motion trajectory and echo shape for the prediction process, they can predict the shape of future low-intensity radar echoes in more detail, and can more accurately predict small areas dissipation of internal radar echo.
As shown in Figure 6, the radar echo image is a gray-scale image, so the visual differentiation of different echo reflectivity regions is not intuitive enough. Therefore, this paper carries out pseudo color processing on the image sequence here. The image sequence processed by pseudo color is only used as the display in this paper, and the actual model still uses the gray-scale image without pseudo color processing.
As shown in Figure 6, we show five benchmark networks and their radar echo images of the first 20 frames after using GMA framework. Although this image sequence does not belong to the scope of radar extrapolation prediction, this paper discusses the image sequence predicted by each network when each frame is input with a real image, and compares it better with the radar extrapolation prediction image sequence.
First, for the heavy precipitation area in the upper right corner, Although the five benchmarks predict the approximate range of heavy rainfall, it is impossible to accurately distinguish the areas around heavy rainfall from nonheavy rainfall areas. However, PredRNN and MotionRNN after using the GMA, under the guidance of global  10 of 13 information, the particle size of heavy precipitation area is smaller, and the boundary of heavy precipitation and nonheavy precipitation is finely divided.
As shown in Figure 7, we show five benchmark networks and their radar echo images of the last 20 frames after using GMA framework. Different from the time step of the first 20 frames, the prediction time frame input of the last 20 frames comes from the network prediction image of the previous frame. Therefore, each frame of the predicted image is crucial to the radar extrapolation prediction process. Some benchmark networks, such as ConvGRU, it predicts that the picture has been mostly blurred. Compare the upper right corner and center area of the images. The strong echo gradually dissipates, and there is no precipitation in some areas in the middle of the original strong precipitation area. The ConvLSTM and ConvGRU lost their long-term memories. Although no precipitation areas were predicted, but it is wrong to predict a large area of heavy precipitation. After using the GMA, they revised the heavy rainfall forecast results under the guidance of global information. Contrast the lower left corner of the images, there are a small number of strong precipitation areas in the large area of no precipitation and low-intensity precipitation areas. After PredRNN++ using GMA, it finely divides the boundaries of heavy precipitation regions. At the same time, compared with other networks, it is better to predict the shape of precipitation region for the middle weak precipitation region.

Conclusions
We propose an improved framework GMA based on timing network radar extrapolation algorithm, which can be applied to any timing network. It establishes a global information flow between network calculation units, and extracts key historical memories through a series of attention mechanisms to guide the prediction and update process. It enhances the ability of network learning and long-term historical memory, and effectively alleviates the dilemma of network interference by image noise. The experimental results on the real radar data set of Hunan meteorological station show that we propose that GMA can effectively optimize the prediction results of radar extrapolation tasks. Figure 6. The first 20 frames prediction example on the radar echo data set.