Multistage reaction-diffusion equation network for image super-resolution

Deep learning-based models have progressed considerably in single-image super-resolution. A high-resolution pattern generation task is performed at the end of convolution neural networks (CNNs) with some convolution-based operations in these models. However, this process may be difﬁcult because all the work is done through the remarkable learning ability of CNN without any speciﬁc learning target. Reaction-diffusion equation (RDE) is a mechanism involved in the pattern generation process that can serve as a guide for super-resolution. It is proposed to embed RDE into a super-resolution network by designing a reaction-diffusion process block (RDPB) in this study. The proposed RDPB uses Euler method for iteratively solving one particular RDE, which is determined by the parameter generated through CNN. Accordingly, this module guides and leads the CNN in generating patterns for image super-resolution. Moreover, a multistage framework is constructed to guide each network module further. On the basis of these two designs, the multistage reaction-diffusion equation network is proposed for image super-resolution. Experimental results demonstrated that the proposed model can obtain ﬁndings consistent with the conclusions of state-of-the-art methods with a relatively shallow structure and small model size


INTRODUCTION
Single-image super-resolution aims to reconstruct highresolution (HR) image with a single view of its degraded lowresolution (LR) version. Image super-resolution is an ill-posed problem because one LR image can be obtained using several different HR images. Many approaches have been proposed to overcome this difficulty. Traditional interpolating methods [1] run fast but obtain unsatisfactory results. Deep learning (DL) methods have progressed considerably in image classification [2], image segmentation [3], and object detection [4]. DL is used for super-resolution, and DL-based models [5][6][7][8][9][10][11] obtain large performance improvement due to the powerful learning ability of convolutional neural networks (CNNs). Since SRCNN [12] was first introduced in CNN for image super-resolution, researchers have proposed many methods other computer vision areas. VDSR [13] constructs a 20-layer network based on depth and structure of VGG [16]. SRResNet [14] and EDSR [5] are constructed based on ResNet [17]. Compared with SRResNet, EDSR removes BN block in residual block (RB, see Figure 4a) to fit the pixel-wise generation situation of image super-resolution. EDSR consists of 32 RBs and its channel number is set to 256. The simple structure of EDSR obtains the first benchmark result that cannot be simply surpassed. SRDenseNet [15] and RDN [6] can incorporate dense connections and build complicated models but only obtain similar results with EDSR when downscale factor and mode are set to 4 and bicubic, respectively. Researchers have begun to incorporate attention mechanism [8,18,19] into image super-resolution. RCAN [8] builds a wide deep model based on 200 RBs and channel attention [20] and obtains the second excellent benchmark super-resolution result. RNAN [19] incorporates spatial attention [21] into this area. SAN [18] proposes an improved version of channel attention called second-order attention, which replaces the global pooling operation result (mean value) in channel attention with secondorder statistics (covariance). Although a new structure is proposed, neither RNAN nor SAN can obtain considerably better super-resolution results than RCAN. These methods obtains high quality SR image that the PSNR/SSIM value comes to a new stage. However, the depth of these model becomes too large, which makes the model hard to follow.
Some models explore the lightweight structure for image super-resolution [22][23][24][25]. DRCN [22] utilizes recurrent structure to reduce the parameter quantity and perform the same operation on feature map recursively. This recurrent structure allows one convolution layer to function like multiple convolution layers. DRRN [23] constructs a recurrent structure based on residual learning [17] and mainly applies skip connection, which links shallow layers with deep layers to alleviate the difficulty of gradient back propagation in deep CNN [17]. MemNet [25] proposes a memory block based on recurrent manner and dense connection [26] and obtains benchmark results among lightweight models. CARN [24] constructs a lightweight model by replacing the standard RB with a light version called efficient RB, which changes two plane convolution operations in RB to two group convolutions and mainly focuses on the spatial information. Moreover, another depth wise 1 × 1 convolution [27] is added at the end for the fusion of channel information. Efficient RB and RB have similar learning abilities but the former has fewer parameter quantities and less computation cost than the latter. Thus, CARN obtains the benchmark superresolution result among lightweight models with a parameter quantity fewer than 1.5 M. The performance of these methods is limited by the number of parameters, thus the PSNR/SSIM value is lower than the mentioned methods in last paragraph.
Generative adversarial network [28] (GAN)-based models [14,29,30] obtain super-resolution results through the powerful distribution generation ability of GAN. GAN-based models aim to achieve real image distribution, which improves the visual super-resolution results. SRGAN [14] introduces the GAN structure and proposes the use of perceptual loss as additional training loss. The perceptual loss, defined as the difference between high-level features extracted by the pretrained VGG [16] model, remarkably improves super-resolution results when evaluated by MOS (human ratings). ESRGAN [29] modifies the model structure of SRGAN in a dense manner [26], and the discriminator is upgraded to relativistic GAN. Thus, ESRGAN and RCAN obtain visually comparable results. SROBB [30] proposes a targeted perceptual loss to train the model. Targeted loss first separates the image area into different parts, namely, boundary, background, and object, using image segmentation methods. Different parts will obtain varying loss calculating formulas. This novel loss helps the model obtain perceptually pleasing results Though the SR image obtained by GAN-based methods has a good visual perception, the PSNR/SSIM value is lower than the SRCNN. The GAN-based methods generate high-frequency details that are not existing in original HR image.
DL models have been explored extensively for image super-resolution. Although existing methods have improved image super-resolution, they failed to determine a function that maps degraded chaos patches to certain pattern patches FIGURE 1 'img098' from urban100 [10], left, is the original HR image; at right are two typical HR patches taken from 'img098' (40 × 40 pix) with their corresponding bicubic 4× downscaled LR patches (10 × 10 pix, upsampled for displaying). Task of SR must fight with this difficulty that mapping those chaos LR patches to certain HR pattern patches (see Figure 1). Existing DL-based methods, as author know, let the CNN itself deals with this mess of mapping chaos patch to clarity patch lying on CNN's learning ability. The only information the model masters is gradients from loss between the super-resolution result and the original image. A mechanism that directly instructs a model on how to generate a natural-world pattern is lacking. The lack of local pattern generation mechanism make these model complex and deep, that are difficult for improvement and application for the huge time consumption. We incorporate the reaction-diffusion equation into SR methods to solve this problem.
We propose a method that uses reaction-diffusion equation (RDE) to address this issue. RDE refers to the differential equation with one particular form (see Equation 1). RDE is closely connected with pattern generation among natural-creature surfaces [32,33]. Unlike the direct generation of super-resolution image of existing methods via CNN, we obtain the superresolution result by solving RDE at the end of the network. CNN is used in our design to generate useful components for RDE. The specific pattern generation part is completed by solving RDE and called reaction-diffusion process in this study. The application of RDE changes the functionality of CNN to determine the necessary pattern for specific local patches. However, solving only one RDE in image super-resolution fails to match with this complicated problem. Hence, a multistage framework is proposed to solve several RDEs continuously and generate several independent components according to immediately transformed features.
We build a wide deep network called multistage reactiondiffusion equation network (MRDENet) consisting of 128 RBs to verify the effectiveness of the proposed method. We group every 4 RBs with an additional skip connection to overcome the difficulty of training long networks (Figure 4b). Experiments showed that the proposed method generates better results than baseline and comparable results with SOTA (RCAN [8]) with a relatively shallow network (RCAN contains at least 200 RBs) and small parameter quantity.
Our study has the following contributions: • We first propose the use of RDE in image super-resolution. A reaction-diffusion process block (RDPB) module is constructed on the basis of the RDE solution to guide the pattern generation in image super-resolution through the special mechanism of RDE. An ablation study demonstrates the effectiveness of RDPB.
• We propose a multistage framework that generates several independent RDEs and solves them continuously. This multistage framework helps to resolve the entire image superresolution problem into pieces and alleviates the difficulty. • We build the MRDENet, which is a wide deep network with an additional short connection for every 4 RBs. The experimental results demonstrated the superiority of MRDENet.

RELATED WORK
The baseline used in this study and a detailed description of RDE are presented in this section.

EDSR
EDSR [5] is a classical network structure in single-image superresolution that stacks RBs ( Figure 4a) and uses pixel shuffle [34] for upsampling at the end of the network. This simple structure obtains satisfactory super-resolution result, which is comparable with the findings of some complicated networks (RDN [6] or D-DBPN [7]) that introduce dense connections to their structures. EDSR is superior to some lightweight methods when setting the model to small size although lightweight models (DRCN [22], DRRN [23], and CARN [24]) are developed on the basis of the recurrent rule. Based on these two observations, EDSR is an appropriate baseline for single-image superresolution.

Reaction-diffusion equation
RDE refers to one kind of differential equation (Equation 1). In this study, U (x, y, t ), V (x, y, t ) denotes the function depicting the spatial distribution of components U , V at point (x, y) as time t progresses. f (U , V ), g(U , V ) can be any function that controls the reaction process between U and V .U t , V t denotes the partial derivative of t of the function U,V. Δ denotes the Laplacian operator along spatial variables, and a 0 ΔU , b 0 ΔV controls the respective diffusion processes of components U and V . Many approaches [32,33,[35][36][37][38] have been proposed to investigate the connection between the natural-world pattern and this simple formula (Equation 1). Certain patches can even be generated by precisely controlling each term in f (U , V ) and g(U , V ). The author [33] generates fish-and squirrel-like patterns by changing parameters in Equation (2). Notably, previous study [32] showed that we can obtain different patterns with the same simple differential equation (Equation 3) under varying parameter settings (see Figure 2). Figure 2 shows six different patterns. Each pattern has its own parameter setting. On the basis of studies mentioned above, we can obtain one derivation with one mapping relation from RDEs to patch patterns, where RDE is determined by some extra parameters. The single-image super-resolution can be completely resolved with this hypothesis as long as we know the entire mapping relation and exact patterns needed when reconstructing the HR image. Although both are unknown, we can bridge the mapping relation and exact patterns necessary through the powerful learning ability of DL methods. As the basic idea of this study, we use CNN to learn the map function and allow the CNN to decide which pattern is needed.

METHODOLOGY
First, the entire network structure of the proposed MRDENet is presented in one stage. As shown in Figure 3, our base model consists of three parts, namely, head, body, and tail, which are highly motivated by EDSR. Abstract shallow features F 0 in the head are extracted from the original LR image with one convolution layer as follows: where Conv 0 denotes one convolution operation. Extracted shallow features F 0 in the body part will then be mapped into the useful deep feature F L via L stacked specific blocks.

FIGURE 3
Structure of proposed one-stage MRDENet: first, A convolution layer is used to extract shallow feature from RGB input; then stacked basic blocks (can be RB or RDPB) is used to transform shallow feature to deep feature and another skip connection with one layer is used to fuse shallow feature and deep feature; finally, the fused feature along with some rough estimation results is used to obtain final SR result through RPDB where H l denotes a feature transform function defined by some combined operations, which can be an RB (see Figure 4a), short block (see Section 3.3 and Figure 4b), or other blocks. Then, shallow F 0 and deep F L features are fused using another convolution layer as follows: The fused deep feature F fuse along with one rough estimation of high-resolution image SR 0 in the tail part (We apply bicubic upscaling on the original LR image as the initial superresolution estimation.) will be used to reconstruct the superresolution result through the proposed RDPB as follows: We will provide a detailed description of the proposed RDPB and multistage framework as well as introduce the short connection used when we build the wide deep MRDENet.

Reaction-diffusion process block
As shown in Section 2.2, we insert the process of solving RDE into the end of the super-resolution, in which we build an RDPB (see Figure 5). RDPB takes the following inputs: (1) fused deep feature F fuse extracted from the LR image and (2) rough estimation of the HR image U 0 . The feature is first upscaled using some transposed convolution operations and will be used to ). The deep model learns the coefficients of each term in Equation (8). Hence, the model will decide the pattern that will be generated in the local patch of the super-resolution result by precisely controlling the RDE used here. The pattern is generated using the reactiondiffusion process once the RDE is decided.
We choose the Euler method to solve the reaction-diffusion process and integrate the intermediate prediction of gradient iteratively. The final integration result is viewed as the superresolution result. The Euler method can be calculated as Equation (9): where dt denotes the time step, which is decided by tuning the loss; U i denotes the i th integration result; and U i t denotes the gradient of U i over t , which is calculated using Equation (8). In this way, we can easily insert RDP into the CNN, which then can be trained from end to end. The RDP algorithm is presented in Algorithm 1. parameters a 0 , a 1 , a 2 , a 3 , b 0 , b 2 , b 3 ,maxiteration times T .

5: return U T
In summary, the proposed RDPB will learn to generate components and parameters from the image data. Components and parameters are used to form the RDE. The RDPB then solves the RDE to generate patterns.

Multistage framework
Although suitable for pattern generation, RDE has difficulty in directly solving the entire super-resolution problem with one single reaction-diffusion process because of the ambiguity in reconstructing HR images from original LR images. Thus, we propose a multistage framework ( Figure 6) that resolves this serious difficulty into different phases, which are solved using one independent RDPB. This multistage framework demonstrates that the output of the first RDPB will be the rough estimation, which will be sent to the second RDPB. are generated from a long deep network at different depths through Equation (10). The body i denotes the i th stacked body part defined in Equation (6).
We propose to obtain the final super-resolution result with several stacked RDPBs using these features.
The multistage framework spreads the reconstruction pressure of large HR details throughout the entire network, whereas the majority of existing methods only place the pressure at the tail of the network. This design clarifies the duty of each part in the network. Thus, it can further enhance the super-resolution result. The number of stages becomes a hyperparameter, which is determined by tuning the loss.

Short block
Deep network training is typically not naive although the network is built on the basis of a residual-like structure (can be seen in [17]). Features will lose shallow feature information due to the deep and long feature transformation path. Some methods use the dense connection to solve this problem [6]. Several methods introduce the short connection [8], which groups several RBs together. A short connection may help in the loss of information. Unlike RCAN [8], which consolidates 20 RBs in each group, we group 4 RBs to form a short block (see Figure 4b) for our network and add a skip connection with another convolution layer at the end of the short block. The newly added skip connection guarantees that the deep feature stays close to the shallow feature, ensures the continuity of features from different layers, and alleviates the difficulty of training the deep network. Thus, the model converges to the better point than not using short connection.

Discussion
Some methods have attempted to solve the entire superresolution problem in stages (DBPN [7], LapSR [39], and SRFBN [40]). However, DBPN and LapSR group the superresolution problem into stages according to scale factors, which highly depend on the necessary upscaling This make them only designed for 2×, 4×, 8× up scales and the stage number is decided by the upscale. Our method can address problems in any stage given that the computation cost is acceptable. Moreover, our multiframework is similar to SRFBN, which divides the super-resolution problem into pieces according the granularity of obtained results and divides these problems into stages on the basis of different supervision methods for various stages. Multisupervision is unnecessary in our method because FIGURE 6 An example structure of two-stage MRDENet the pattern generation mechanism of RDE naturally solves the pattern into different stages.

Implementing details
Combined with the mentioned strategy, we build a 128-RB MRDENet, which consists of one head and four stacked body parts with a corresponding tail part (see Section 3). Each stage contains 8 short blocks, and the convolution kernel size is set to 3 with the channel set to 64, except for the upscale block in each RDPB wherein we use the transposed convolution to upscale the feature map. We use one two-strided 3 × 3 kernel for scale 2. We use two stacked two-strided 3 × 3 kernel for scale 4. We use one three-strided 5 × 5 kernel for scale 3. Short skip connection is not applied because of the small network depth (e.g. number of RBs fewer than 100). We train the network with the following L 1 loss:

Training setting
We use DIV2K [41], which consists of 800 HR images, as our training set. LR images for training and testing are generated using the 'imresize' function in MATLAB with default parameters and corresponding scales. We randomly crop 50 × 50 patches in the LR image as input for training. Each image patch has a half chance to be flipped horizontally and has a quarter chance to be rotated by 90 • , 180 • , and 270 • . The batch size is typically set to 16. We test four standard benchmark datasets, namely, Set5 [42], Set14 [43], B100 [44], and Urban100 [10], which are a total of 219 images with diverse resolutions and contents. The super-resolution results are commonly evaluated using PSNR and SSIM on Y channel of YCbCr space. Figures 7-10 and Table 1 clearly show the comparison of grouped benchmark datasets and the mean PSNR and SSIM of 219 images. We train the model using Adam Optimizer, and beta parameters are set to betas = (0.5, 0.7). The learning rate is initialized with 1e − 4 and decreased to half for every 2 × 10 5 iteration. Our code is implemented using PyTorch.

Parameter study
The stage number k and iterations T in RDP both have a significant impact on the performance of the proposed MRDENet.
We use RB as the basic block in this study ( Figure 6). We set the number of RBs to 64. Then, we change the stage number k and the iteration T to determine the performance variation of MRDENet. The PSNR of super-resolution results obtained by these models are plotted in Figure 7. Notably, these    Figure 7 shows that the proposed MRDENet with the same RDP iterations (same x-coordinate) constantly obtains better super-resolution results when the number of stages increases(s1 → s2 → s4). Furthermore, the proposed MRDENet with the same stages (same line) obtains better super-resolution results when the iteration number increases(x-coordinate increases). These observations showed that the pattern generation ability of the proposed RDPB helps solve the super-resolution problem and the proposed multistage framework (Section 3.2) can alleviate the pressure placed on the network tail and help improve the super-resolution results. Although additional stages and RDP iterations can help improve the performance, they become time consuming because the GPU fails to accelerate and optimize the implementation of the RDP. Thus, we set k = 2, T = 2 in IV.C, k = 4, T = 2 in IV.D as our default setting to achieve a balance between the training time and the performance.

Compared with the baseline
We compare the performance of the proposed MRDENet and EDSR-baseline in this section. We set k = 2, T = 2 on the basis of the experiment results in the previous section. Then, we gradually increase the number of RBs in the body part. The results are displayed in Figure 8. Both the baseline and the proposed MRDENet obtain better super-resolution result with increasing model size. However, this feature can be misleading because the deep model constantly performs better. Moreover, the constantly better PSNR of MRDENet compared with the EDSRbaseline verifies the superiority of the proposed method. The improved PSNR can be attributed to the pattern generation guidance ability provided by the proposed RDPB, which can tell the model how to generate a pattern and which pattern is needed. This process is relatively easy compared with directly deciding which pattern is necessary while generating the pattern.

Compared to SOTA
We compare the proposed MRDENet with some SOTA methods in this section. We choose 10 state-of-the-art methods, namely, SRCNN [12], VDSR [13], LapSR [39], MemNet [25], EDSR [5], DBPN [7], RDN [32], RCAN [8], RNAN [19], SRFBN [40] and CSFM [45]. We build a 128 RB-based deep network with short connection (Figure.4b). The super-resolution results of four benchmark datasets are listed in Table 1. Note that the final column shows the mean PSNR and SSIM of all images in four benchmark datasets. The best results are presented in bold font and the second-best results are underlined. Table 1 demonstrates that the proposed MRDENet obtains comparable PSNR and SSIM super-resolution results with RCAN. Notably, MRDENet only has 128 RBs, which is less than RCAN's 200 RBs with an additional channel attention block in each RB.The CSFM is also built with 128 RBs, while the CSFM put channel attention mechanism and spatial attention mechanism in each RB. The depth of CSFM is much higher than the proposed MRDENet. Therefore, our proposed RDPB helps MRDENet obtain comparable results with RCAN using a relatively shallow network structure (we do not build the MRDENet with 200 RBs because of the limitation of computation power and time) and guides the pattern generation and multistage framework, which spreads the reconstruction pressure into different parts of the model. The bottom of Table 1 also shows the results of MRDENet compared to RDN [46] and DRN-L [47] for 4× superresolution when trained with newly used Flickr2K [41] dataset. The results show that the proposed method can get PSNR/SSIM increment when trained with more data, and can get comparable results of recently published methods.

Model size analysis
We analyze the model size in detail and compare the proposed full-sized MRDENet with some state-of-the-art methods, namely, SRCNN [12], VDSR [13], MemNet [25], EDSR [5], RDN [5], D-DBPN [7], CSFM [45], and RCAN [8], in this section. We display the PSNR results on all four benchmark datasets with scale set to four times. Figure 9 shows that the number of parameters influences the performance. Meanwhile, RCAN obtains the benchmark results with only a third of EDSR's parameters. The proposed MRDENet obtains an estimated parameter reduction of 25% compared with RCAN to obtain comparable results. However, RCAN tends to sacrifice the network depth to obtain benchmark results with fewer parameters than EDSR. Figure 10 showsure that the model's performance is highly related to the depth. High PSNR increases the depth of the benchmark results. RCAN has approximately five times the depth of EDSR while the proposed MRDENet obtains comparable results with a shallower network structure by around 40%. These observations support the effectiveness of the proposed RDPB structure, which incorporates the reactiondiffusion mechanism into the super-resolution.

Ablation study
We investigate the effectiveness of different part of the proposed MRDENet. The EDSR-like model with 128 RBs and feature number set to 64 is our baseline. The basedline model with short block structure is named MRDENet-NRDP, which only lacks the multistage RDPB compared to MRDENet. Symmetrically, the baseline model with multistage RDPB is named MRDENet-NS.The results are shown in Table 2.
As can be seen, the individual introduction of multistage RDPB/short block gets a 0.07/0.08 dB increment. When combining together, the proposed MRDENet gets 0.14 increment compared to the baseline. These results verify the effectiveness of the proposed multistage RDPB and short block.

Reaction-diffusion process analysis
We investigate the proposed RDPB by answering the following basic questions: What does the RDPB generate? Are the proposed RDPB simply learning to generate a residual? We take the V 0 component generated by the final (fourth) RDPB and super-resolution results SR 3 by the previous (third) RPDB. SR 3 tends to be the input super-resolution estimation for the final RDPB. We compute the residual between SR 3 and HR and display the results in Figure 11. The generated component V 0 is completely different from the residual. Moreover, the residual comprises the necessary HR details lost when the image degrades to its LR version. This information is highly correlated with edges and difficult to generate. Thus, we can see colourful areas and dashed edges in the residual. However, generated components are more interested in edges of the image, which is

FIGURE 11
First column is the original HR image; second column is SR 3 result of third RDPB in our proposed MRDENet; third column is the residual between SR and HR |HR − SR 3 |, absolute value operation is for displaying; fourth column is the generated component V 0 by the final RDPB based on SR 3 . Obvious difference can be seen between the residual and the component to say the edges of images get high value (looks more obvious than rest area) in V 0 according the results shown in Figure 11. This finding further verifies the effectiveness of the proposed RDPB, which is designed to generate patterns while edge combinations make patterns. Therefore, our proposed RDPB learns to generate useful components for pattern generation instead of producing residuals.

CONCLUSION
We propose to insert the reaction-diffusion process into a single-image super-resolution and build an RDPB in this study. This newly designed RDPB helps in pattern generation. We also design a multistage framework to alleviate the reconstruction difficulty at the network tail and spread it throughout the entire network at different depths according the reaction-diffusion mechanism. These two designs and short connection are used to build a wide deep MRDENet for single-image resolution. Experiments demonstrated that the proposed RDPB and multistage framework can improve the super-resolution results. Furthermore, the proposed MRDENet obtains comparable results with the benchmark model RCAN with 25% less parameters and 40% shallower network structure. This finding indicates the superiority of incorporating the RDE into the image superresolution. However, we also find that the proposed RDPB is highly interested in edges of image, which is a local image feature, and may lead to the loss of global or non-local information.
We will attempt to solve this problem in a future investigation.