The Deep Capsule Prior – advantages through complexity?

In inverse problems, an extensive number of ground truth samples for the training of supervised deep learning models is seldom available. Unsupervised approaches, like the Deep Image Prior, offer a valuable alternative in this case. In our work, we combine the idea of the Deep Image Prior with recently proposed capsule networks. The new model is tested against a standard convolutional Deep Image Prior on different image processing tasks and computed tomography reconstruction.


Introduction
Deep learning approaches have become state-of-art in image classification, translation, denoising and a wide range of other applications. In many cases, so-called supervised methods are used, requiring pairs of input data and ground truth for the training of the network. Depending on the problem, the number of necessary data pairs can be extensive. Inverse problems are an example where the amount of available ground truth data is often limited. Therefore, unsupervised deep learning models have great relevance for these applications. A prominent example of an unsupervised model is the Deep Image Prior (DIP) [1]. For image applications, convolutional neural networks (CNN) are usually used as DIP and the parameters are determined individually for each measurement. Recently, capsules [2] have been introduced as an extension of regular convolutional layers. In our work, we investigate the use of a capsule network for the Deep Image Prior. We compare this Deep Capsule Prior (DCP) model against a classical CNN in tasks from image processing and computed tomography reconstruction.

Inverse problems
In inverse problems, one is interested in finding the solution x † from noisy measurement y δ . The relationship Ax † ⊕ ξ = y δ between solution and measurement can be described by a forward operator A : X → Y . In addition, the measurements are disturbed by noise. Therefore, the forward mapping is combined with a realization ξ from a noise generating process and the noise is assumed to be bounded D Y Ax † , y δ ≤ δ w.r.t. some discrepancy D Y : Y × Y → R + . At best, there exists a unique solution x † and the inverse mapping A −1 : Y → X is continuous, i.e. stable against small perturbations. Otherwise, the inverse problem is called ill-posed in the sense of Hadamard. To determine a candidatex for the solution, one can solve an optimization problem Since the majority of inverse problems are ill-posed, it is advisable to stabilize the problem by including prior knowledge in form of a weighted regularization term R : X → R + .

Deep Capsule Prior
The Deep Capsule Prior is based on the idea of the Deep Image Prior to convert the optimization problem (1) into a parameter identification problem for a function φ Θ : In general, φ is a deep neural network with parameters Θ and the final parameter configurationΘ is determined by gradient descent. Note, that the parameter problem (2) is usually non-convex and, therefore, the existence of a unique configuration Θ is not guaranteed. The idea of the DIP is combined with the recently proposed capsule networks. Capsules are multidimensional extensions of regular neurons (cf. Sabour et al. [2]). In addition, the activation function, called routing, acts on all inputs simultaneously and can be understood as a clustering step. This additional complexity was shown to be beneficial in supervised tasks like classification and semantic segmentation. Its effect on unsupervised problems is investigated in the next section.

Numerical experiments
For all DIP & DCP experiments we use a U-Net network architecture and replace the convolutional layers with their capsule equivalent in case of the DCP. For the routing mechanism, the direct routing as proposed by Sabour et al. [2] is employed. Both networks are regularized by total variation (TV), i.e. R(x) = ∥∇x∥ 1 . We investigate the same image processing tasks 2 of 2

Inpainting
In inpainting, we are given an image y = M ⊙ x in which pixels are missing as measurement data. The mask M is a binary matrix specifying which pixels are missing. The L2 norm is used as the discrepancy. As the discrepancy is independent of the masked pixels, the prior has a large influence in inpainting. If minimized directly over the pixel values of the image, as in equation (1), the optimization would have no effect on the masked pixels.
Masked Image DIP DCP DCP DIP Masked Image Fig. 1: Inpainting a large region. The white pixels in the masked image correspond to the missing pixels. The results for the DIP were created using the code from the official Github https://github.com/DmitryUlyanov/deep-image-prior.

Computed Tomography
In computed tomography (CT) the forward operator A can be modeled by the Radon transform. For the numerical experiments we use two different datasets with 100 test images each. The first consists of synthetic ellipses and simulated sparse-angle, parallel beam measurements with added Gaussian noise. The measurements are undersampled, such that there exist multiple possible solutions. The reconstructed images have a size of 128 px × 128 px. The second dataset contains simulated lowintensity measurements from real, normal-dose thoracic CT images of size 362 px × 362 px. In this case, the measurements are oversampled and the existence of a solution is thereby not guaranteed. The DCP reconstructions are compared against results from Baguer et al. [3], which include a CNN DIP and classical reconstruction methods. We refer the reader to their publication for more information on the two datasets and the included methods. The results are displayed in Table 1.

Conclusion
The Deep Capsule Prior showed promising results in the numerical experiments. Still, further research is necessary to surpass the performance of a classical CNN-based Deep Image Prior in the future and, thereby, justify the additional computational cost. Potential improvements lie in the choice and design of the routing function and the network architecture.