Tiny-RainNet: a deep convolutional neural network with bi-directional long short-term memory model for short-term rainfall prediction

Rainfall is not only related to current and previous meteorological conditions but also to meteorological conditions of the current location and surrounding regions. Existing short-term rainfall prediction methods mainly focus on radar echo extrapolation to predict the future radar echo maps, and then retrieve rainfall based on the predicted radar echo maps. These methods obtain rainfall through two separate steps usually leading to large accumulated errors. However, Tiny-RainNet is proposed by combining convolutional neural networks (CNNs) with bi-directional long short-term memory (BiLSTM) to directly predict future rainfall based on sequential radar echo maps. The structure of Tiny-RainNet is simpler than existing rainfall prediction models combining CNNs with LSTM. In order to further reduce computational complexity of the Tiny-RainNet and obtain good rainfall prediction results, 10 × 10, not the original 101 × 101, sequential radar maps are used as inputs of the Tiny-RainNet after making many tests considering temporal – spatial meteorological conditions. The proposed model takes into account the influence of temporal – spatial meteorological conditions on rainfall prediction. This avoids the accumulated error caused by multi-step prediction methods. The overall performance of the Tiny-RainNet model performs better than fully connected LSTM, LSTM and convolutional LSTM

Tiny-RainNet and obtain good rainfall prediction results, 10 × 10, not the original 101 × 101, sequential radar maps are used as inputs of the Tiny-RainNet after making many tests considering temporal-spatial meteorological conditions. The proposed model takes into account the influence of temporal-spatial meteorological conditions on rainfall prediction. This avoids the accumulated error caused by multi-step prediction methods. The overall performance of the Tiny-RainNet model performs better than fully connected LSTM, LSTM and convolutional LSTM.

K E Y W O R D S
BiLSTM, CNN, radar echo sequence images, short-term rainfall prediction, Tiny-PFNet

| INTRODUCTION
Rainfall is a meteorological characterization that relies on a non-linear dynamic multi-temporal-scale circulation system. It is also the product of a combination of local circulation and thermal effects with local uneven topography and geomorphology (Maussion et al., 2014). Short-term rainfall prediction is the prediction of rainfall intensity within a relatively short time (generally 0-6 hr) for a specified area (Doviak et al., 1994). Short-term heavy precipitation will basically exceed 50 mm in just 1 hr, which is likely to cause major dangers such as flash floods, mudslides, landslides. Therefore, precise and reliable rainfall forecast, especially extreme rainstorm forecast, is not only the basis for the rational development and scientific deployment of water resources, but also the key to ensure social stability and maintain natural ecology and the environment.
Present rainfall prediction methods mainly include numerical weather prediction (NWP) and radar echo extrapolation (Feng et al., 2017). NWP uses mathematics, physics, atmospheric dynamics and other methods to analyse weather evolution and to predict future weather. NWP products provide a basis for the daily predictions of meteorologists (Pan et al., 2013;Shu et al., 2013). However, NWP has the shortcomings of uncertainty and parameterization error. These problems are mainly related to the degree of grid refinement and parameterization errors that mainly involve the initial value error and iteration errors in the calculations (Ma and Bao, 2017). Since NWP considers various complicated factors, the cost of rainfall prediction is greatly increased. In addition, NWP is more accurate for predictions covering a longer time period, and it has less ability to predict adjacent rainfall (Shi et al., 2015a). Radar echo extrapolation technology predicts the future position and intensity of radar echoes, which can enable more rapid tracking prediction of strong convection systems (Zhang et al., 2008;Otsuka et al., 2016). It is widely used in predicting rainfall. Research on the Real-time Optical Flow by Variational Methods for Echoes of Radar (ROVER) algorithm, proposed by the Hong Kong Observatory (Woo and Wong 2014), has been useful for accurate extrapolation of radar maps (Germann and Zawadzki, 2002;Sakaino, 2013). However, the accuracy of optical flow based methods is limited because (a) the optical flow method only considers the correspondence between two adjacent frames and does not consider consecutive multiple frames; (b) the flow estimation step and the radar echo extrapolation step are separated, and (c) it is difficult to determine the model parameters needed to produce accurate predictions.
Image detection, classification, regression and other complex problems are being simplified by the use of big datasets and training models. Machine learning has helped solve many of the technical problems noted above. Artificial neural networks (ANNs) have also been used for rainfall prediction. For example, Lee et al. (1998) predicted daily rainfall using a radial basis function network based on location information. Because an ANN easily falls into a local optimum, many samples are needed in training to achieve the best results. Therefore, more effective methods are being investigated to predict rainfall. A statistical machine learning method, the support vector machine, has been widely used in radar quantitative rainfall prediction (QPF) because it is better at small sample prediction (Nikam and Gupta, 2013;Sehad et al., 2017;Yang et al., 2018). Additionally, the terrain-based weighted random forest method  and other machine learning methods have been used for radar QPF (Sinclair and Pegram, 2005;Sideris et al., 2014;Guo, 2015;Verdin et al., 2016;Gou et al., 2018;2019). Then, as a newer research direction in the field of machine learning, deep learning has been successfully applied to meteorological prediction. A new dynamic convolutional layer for shortrange weather prediction was presented by Klein et al. (2015). Qiu et al. (2017) proposed a multi-task convolutional neural network (CNN) model to automatically extract features from the time series measured at observation sites and studied the correlation between the multiple sites for weather prediction. However, prediction of rainfall is a spatiotemporal sequence prediction problem. It inputs the past radar map sequence and outputs the future rainfall during a certain period of time (Shi et al., 2015a). Being aware of this problem, Tang et al. (2018) converted the QPF into a continuous conditional random field (CCRF) learning problem and proposed a geographical and temporal CCRF (GAT-CCRF) model to study the spatiotemporal correlations of rainfall. However, GAT-CCRF has strict requirements on the time information of the research object, and it needs to be sampled and modelled according to the time information. Are there any other methods to solve the spatiotemporal problem?
The general machine learning method can only process spatial information, but it cannot process time information. Long short-term memory (LSTM) (Hochreiter and Schmidhuber, 1997) is an improved model of recurrent neural networks (RNNs) that can process information with a time dimension. LSTM can be used to improve the prediction accuracy of rainfall. However, with a large number of computing units, the network depth of LSTM is limited by the amount of computer memory. Shi et al. (2015a) integrated the convolution idea into LSTM and proposed the convolutional LSTM (ConvLSTM) for rainfall nowcasting. ConvLSTM predicts the radar echo of 15 frames based on the radar echo of the past five frames and then calculates the rainfall using the Z-R relationship, where Z is the radar echo intensity in decibels and R is the rainfall in millimetres per hour. The results indicate that it is superior to the optical flow method, but it still has the disadvantage of error accumulation between radar echo extrapolation and retrieval of rainfall. In the present study, we attempt to improve the network structure of the convolutional recurrent neural network (CRNN) and propose a Tiny-RainNet model which combines CNN and bidirectional LSTM (BiLSTM). We used Doppler radar echo data from 2014 to 2016 in Shenzhen, China, to verify the proposed Tiny-RainNet model and to predict 1 hr of rainfall during the next 1-2 hr.

| Dataset
The dataset from the CIKM AnalytiCup 2017 in the Tianchi Big Data Algorithm Competition of Alibaba (https://tianchi.aliyun.com/competition/entrance/ 231596/introduction) includes gauge rainfall and Doppler radar echo maps of the target area. This information was collected by the meteorological observation centre in Shenzhen, China, from 2014 to 2016 (mainly from March to September), including 10,000 sets of training data, 2000 sets of verification data and 2000 sets of test data. Each radar map covers a target location and its surrounding area and is marked as an X × Y mesh, where each grid point records the radar reflectivity factor Z. The dataset details are as follows.
1. Each radar chart contains a target site and covers an area of 101 × 101 km 2 which is marked as 101 × 101 grids. The target site position is in the centre, which is (50, 50), as shown in Figure 1 (the red area at the centre is the target site).
2. Each radar map corresponds to the accumulated rainfall for the hour between the first and second future hours.
3. The radar maps have different time spans, with an interval of 6 min, a total of 15 time spans represented by T0-T14, and different heights, with an interval of 1 km, from a distance of 0.5 km to 3.5 km at four heights represented by H0-H3, as shown in Figure 2.
The format of the dataset is "id, label, radar_map." "id" is the data sample number for a total of 14,000 samples; "label" is the rainfall of the target site; the radar_map is arranged in the "THYX" format, the length of which is 612,060 (15 × 4 × 101 × 101).

| Data analysis
To analyse the distribution of rainfall at the target sites, Figure 3 presents a histogram and box plot of the rainfall of 14,000 samples at the target sites. It can be seen from Figure 3 that the rainfall of more than half of the target sites is less than 10 mmÁhr -1 ; about 50% of the rainfall is evenly distributed from 10 to 60 mmÁhr -1 ; only 1.5% of the rainfall is greater than 60 mmÁhr -1 . Furthermore, the 1 hr rainfall accumulation of more than 10 mm mainly focuses on the range 20-30 mm and the distribution density gradually decreases as the rainfall increases for other conditions. The rainfall in China decreases from the southeast coast to the northwest inland. The reason is that the southeast coast is close to the ocean, strongly affected by the summer monsoon, and has a lot of rainfall. Inland, the wind and shadow are weak in summer and the rainfall is little. According to statistics (Yao et al., 2009), rainfall intensity above 8 mmÁhr -1 is rare in China. However, there are exceptions, such as rainfall in the eastern region of northwest China, the northern region of north China, the Yangtze River Basin and the lower Yellow River. In particular, Shenzhen often has low pressure monsoons, tropical cyclones, high temperatures and heavy rain in summer. It is no accident that the maximum hourly rainfall is more than 100 mm in Shenzhen.

| MODEL
Tiny-RainNet is a neural network that improves the overall structure of the CRNN proposed by Shi et al. (2015b). The network architecture of the Tiny-RainNet model is shown in Figure 4. The Tiny-RainNet consists of the convolutional layer, the BiLSTM (Graves et al., 2005) layers and the dense layers. Table 1 lists the parameters of the CRNN and Tiny-RainNet. The function of the convolution layer is to extract the context information from different receptive fields. BiLSTM is a combination of forward LSTM and backward LSTM, which captures the long-distance dependence and is well-suited for contextsensitive sequence annotation tasks. The output layer can be modified and applied to time series prediction problems. When using the radar map sequence for prediction, with limited information in a single radar map, the main information is stored in a time series radar map. Tiny-RainNet inputs a series of grey radar maps. First, two convolutional layers are used to extract the characteristics of the radar maps, where conv1 (1 × 1 kernel) is used F I G U R E 1 Radar image and target station to fuse the features of radar maps between different time series and conv2 (3 × 3 kernel) is used to extract the context information of the same radar map. Then, a dropout layer immediately after the convolution layer effectively prevents overfitting and produces a feature map with a height of 1. This feature map is sliced along the x-axis connects, and each slice is used as a time step for the BiLSTM network. Since the actual rainfall is related to the height and time span of the input radar echo maps, and the map information from different altitudes and times is correlated, Tiny-RainNet continuously uses three BiLSTM layers to increase the depth of the time series prediction and to reduce the error of rainfall prediction. Finally, the predicted rainfall is obtained by two dense layers. We used an adaptive gradient optimizer (Duchi et al., 2011) in the experiment. We saved the relatively better model configuration in Table 1, to be evaluated with a test set during the optimization process.

| EXPERIMENTAL RESULTS AND ANALYSIS
We present experimental validation of the proposed Tiny-RainNet model. Section 4.1 lists some evaluation parameters of rainfall prediction. A sensitivity experiment of input radar image size and the parameters of Tiny-RainNet are given in Section 4.2 and Section 4.3, respectively. Finally, the results with comprehensive comparisons are shown in Section 4.4.
All the training and testing processes were performed on a Dell Intel Core i7 CPU computer and GTX 1080 GPU.

| Evaluation parameters
The root mean square error (RMSE) is commonly used to reflect the total error of prediction results. The correlation coefficient (CC), mean bias (MB) and mean absolute error (MAE) are used as auxiliary reference indicators of rainfall prediction. The RMSE, CC, MB and MAE can be computed using the following equations: where R a is the gauge rainfall and R g is the predicted rainfall. R a and R g indicate the average of R a and R g respectively. n is the total number of samples. The RMSE describes the overall precision of the rainfall prediction. The CC demonstrates the linearity between the predicted rainfall and the gauge rainfall. A value of MB >0 indicates that the estimated result is greater than that obtained by the rain gauge, that is, overestimated. Conversely, MB value <0 indicates underestimation, for which the smaller the MB is, the greater the degree of underestimation. The MAE, reflecting the estimation error to some extent, can avoid the mutual cancellation of positive and negative errors.

| Sensitivity experiment of input radar image size
Due to the magnitude of the original dataset (15 × 4 × 101 × 101 = 612,060) and the large number of calculation units of BiLSTM itself, a large number of calculations are required if the original data are directly input into the proposed Tiny-RainNet model for rainfall prediction. Therefore, it is necessary to reduce the dimensions of the original data. A direct method of dimension reduction is scaling the input radar maps by downsampling. However, the map zooming process causes a loss of information. In order to minimize this loss, we first input radar maps with different degrees of scaling at the same time spans and different heights into the Tiny-RainNet model. After training and prediction, the RMSE of predicted rainfall and gauge rainfall was obtained.

F I G U R E 4 Framework of the proposed Tiny-RainNet model architecture
Then we obtained the RMSE of the rainfall prediction results after radar image zooming of the same height and different time spans. Finally, according to the RMSE, we can obtain the best scale of image scaling.
In general, rainfall is most relevant to the nearest meteorological conditions (He et al., 2017), so radar images of different sizes at T14 and H0-H3 were input into the Tiny-RainNet model, and the RMSE of the rainfall prediction results was calculated and is shown in Figure 5. It can be seen that, at any height of T14, the minimum RMSE corresponds to a 10 × 10 radar map. In particular, at H1, rainfall prediction with a 10 × 10 radar map has the best performance. Therefore, the radar maps of T0-T14 at H1 were scaled to the different size and input to the Tiny-RainNet model to train and predict rainfall. The RMSE of the predicted result was obtained and is also shown in Figure 5. Similarly, at the same height H1, regardless of time span, the minimum RMSE corresponds to a 10 × 10 radar map in Figure 5. In particular, the 10 × 10 radar map at T14 performed best. We conclude that scaling the original 101 × 101 radar map to 10 × 10 greatly reduces the required calculations but also helps to reduce the prediction error of the model.

| Parameters of Tiny-RainNet
To facilitate rainfall prediction with sequential radar maps, we modified the network structure of the CRNN. Since the transcription layer in the CRNN cannot solve the prediction problem, we first changed it to the dense layer and then modified the parameters of the convolutional layer, the BiLSTM layer and the size of the convolution kernel. The modifications were based on the characteristics of the sequential radar maps. Due to the many parameters involved, only the number of convolution layers and BiLSTM layers in the rainfall prediction model are discussed here. There are many calculation units in the BiLSTM, and the increased number of layers can cause exponential growth in the number of calculations. The number of BiLSTM layers generally does not exceed four in practical applications. The network structure of the CRNN includes seven convolution layers and two BiLSTM layers. We first determined how many BiLSTM layers should be used in the rainfall prediction model to obtain the best prediction performance. For example, seven convolutional layers were used to extract features of the radar maps, and then one to four BiLSTM layers were used to verify the performance of the model. The RMSE of the prediction results is shown in Table 3. The effect of a three-layer BiLSTM with a seven-layer convolution increased performance. Then, the number of convolutional layers in the rainfall prediction model was determined to improve prediction. Table 2 shows the RMSE of the rainfall prediction results using one to seven convolutional layers with a three-layer BiLSTM. A Tiny-RainNet structure combining two convolutional layers and three BiLSTM layers was suitable for the rainfall prediction problem with sequential radar maps. Excessive convolution layers appear to be unnecessary for the small 10 × 10 radar map, which would reduce the training speed and also lead to overfitting.

| Results and discussion
Using all values of the radar echo map as features and the future rainfall of the target site as a label, the official data of the CIKM AnalytiCup 2017 competition was used to train linear regression (LR) for rainfall prediction. The RMSE was 14.69 mmÁhr -1 . In addition, the RMSEs of the top four models in this competition were 10.99 mmÁhr -1 , 12.33 mmÁhr -1 , 12.94 mmÁhr -1 and 13.2 mmÁhr -1 respectively (https://tianchi.aliyun.com/ competition/ entrance/231596/rankingList). We trained the Tiny-RainNet model on the CIKM dataset with the parameters in Table 3, which required about 1.5 hr. It only takes about 3 ms to test a sample, which meets the requirements for operational use. Figure 6 shows the loss function curve of training and validation with the proposed Tiny-RainNet model. The Tiny-RainNet model, with a RMSE of 9.29 mmÁhr -1 , is reduced by 36.76% compared to the LR method.
In addition, ConvLSTM (Shi et al., 2015a), LSTM (Akbari Asanjan et al., 2018 and fully connected LSTM (FC-LSTM) (Kim et al., 2017) were used to compare and verify the performance of the proposed Tiny-RainNet model. All the experimental results are shown in Table 4 and Figure 7. We find that the precipitation predicting performance of the FC-LSTM model is not as good as our Tiny-RainNet model, which is mainly due to the spatiotemporal and sequential properties of radar reflectivity and precipitation. The FC-LSTM model structure has too many redundant connections and makes the optimization unlikely to capture these local consistencies (Shi et al., 2015a). The LSTM and ConvLSTM are more inclined to capture temporal information. Tiny-RainNet adds BiLSTM to the spatial feature map generated by the CNN and uses the hidden states of the BiLSTM for final precipitation prediction. Our Tiny-RainNet model, which is an end to end trainable network, has a strong ability to capture spatiotemporal and sequential information. The above properties of our Tiny-RainNet model are very suitable to capture the spatiotemporal and sequential features of radar reflectivity and precipitation. However, the parameters have a great influence on the model, and these parameters need to be determined by some experiments or tests. The numbers of model convolutional layers and BiLSTM layers largely depend on the complexity of the data. The larger the input image is, the more layers there are of the CNN. The more complex the time series of the data is, the more are the number of layers or hidden nodes of BiLSTM. Other parameters, such as optimizer, learning rate, batch size and epoch, are adjusted according to the training results to obtain a better model.
According to Table 4 and Figure 7, the rainfall prediction results using the LR and FC-LSTM are poor, and the RMSE, BIAS and MAE are also bigger than the results of the LSTM, ConvLSTM and Tiny-RainNet. Compared with LR and FC-LSTM, ConvLSTM and LSTM are relatively effective and their CC shows strong correlation between predicted rainfall and gauge rainfall. The biases between predicted rainfall with LSTM, ConvLSTM and gauge rainfall is mostly concentrated in the range (−15, 15) mmÁhr -1 , but the prediction effect is not as good as the proposed Tiny-RainNet. The error between gauge rainfall and predicted rainfall by Tiny-RainNet is more concentrated: (1) more than 60% of the bias is between -5 mmÁhr -1 and 5 mmÁhr -1 ; (2) nearly 35% of the bias is between -15 mmÁhr -1 and 15 mmÁhr -1 ; (3) the distribution of bias is The Tiny-RainNet is a promising model in short-term rainfall prediction.

| SUMMARY AND CONCLUSIONS
Rainfall prediction is a typical spatiotemporal and sequential prediction problem, which is important in meteorological services. According to the principle of rainfall formation, from the perspective of space, rainfall is related to the weather conditions of the current location and also highly susceptible to the surrounding environmental conditions. Rainfall is also closely related to the current weather as well as the weather of the previous period. We improved the network framework of the convolutional recurrent neural network and proposed a Tiny-RainNet model for short-term rainfall prediction. For the problem of rainfall prediction using complex sequential radar echoes, the proposed Tiny-RainNet model first extracts the context information of the radar echo maps through convolution layers, and then a bidirectional long short-term memory (BiLSTM) layer is used to analyse and predict the context between the radar echo sequences. Adding a dropout layer after the convolution layer and BiLSTM layer effectively prevents the test result from being poorly fitted due to the training set being too rapid. The performance of the Tiny-RainNet model on the CIKM AnalytiCup 2017 dataset proves that the network structure has advantages in the rainfall prediction problem based on spatiotemporal sequence.
1. The optimal size of the input radar maps of Tiny-RainNet is determined by combining spatiotemporal information. 2. Compared with traditional rainfall prediction models, including the optical flow method and numerical weather prediction, the Tiny-RainNet model has a simpler structure and a faster calculation speed. Therefore, the Tiny-RainNet model is more suitable for short-term rainfall prediction. 3. The proposed Tiny-RainNet model combines the advantages of a convolution layerʼs ability to extract spatial information and the BiLSTMʼs ability to deal with sequential problems. The comprehensive performance is better than that of existing similar models.
At present, we only predict the rainfall in the next 1-2 hr. In the future, relevant experiments will be conducted to verify the performance of the proposed model for more than 2 hr and mid-long-term rainfall prediction. F I G U R E 7 Bias histogram for rainfall prediction results of linear regression (LR), fully connected long short-term memory (FC-LSTM), long short-term memory (LSTM), convolutional long short-term memory (ConvLSTM) and Tiny-RainNet