Image deblurring is a fundamental and challenging low-level vision problem.Previous vision research indicates that edge structure in natural scenes is oneof the most important factors to estimate the abilities of human visualperception. In this paper, we resort to human visual demands of sharp edges andpropose a two-phase edge-aware deep network to improve deep image deblurring.An edge detection convolutional subnet is designed in the first phase and aresidual fully convolutional deblur subnet is then used for generating deblurresults. The introduction of the edge-aware network enables our model with thespecific capacity of enhancing images with sharp edges. We successfully applyour framework on standard benchmarks and promising results are achieved by ourproposed deblur model.
Quick Read (beta)
Edge-Aware Deep Image Deblurring
Image deblurring is a fundamental and challenging low-level vision problem. Previous vision research indicates that edge structure in natural scenes is one of the most important factors to estimate the abilities of human visual perception. In this paper, we resort to human visual demands of sharp edges and propose a two-phase edge-aware deep network to improve deep image deblurring. An edge detection convolutional subnet is designed in the first phase and a residual fully convolutional deblur subnet is then used for generating deblur results. The introduction of the edge-aware network enables our model with the specific capacity of enhancing images with sharp edges. We successfully apply our framework on standard benchmarks and promising results are achieved by our proposed deblur model.
As a branch of image degradation, image blur is a common phenomenon in the realistic shooting scene. In general, blur factors are complex and varied in parts of the image. For example, different choices of aperture size and focal length can lead to Gaussian blur, error operations (such as out of focus), camera shake, and complex scenarios with moving objects may result in both Gaussian blur and motion blur. It is difficult to confirm the blur reason because of concurrent situations. Besides, blur inversion is a quite ill-posed problem, as a blurry image may correspond to multiple possible clear images. Therefore, single image blind deblurring is a very challenging low-level vision problem.
Early works for image deblurring depend on various strong hypothesis and natural image priors . Then some uncertain parameters in the blur model will be certain, such as the type of blur kernel and additive noise [2, 3]. However, in the real world scenario and applications, these simplified assumptions on sampling scene and blur model may lead to bad performance. Furthermore, these methods are computationally expensive and usually need to tune a large number of parameters.
In recent years, the applications of deep learning and generative networks on the computer vision tasks have created a significant breakthrough in many research fields. Many regression networks based on Convolutional Neural Networks (CNN) were proposed for image restoration tasks, including a few approaches to handle the image deblurring problem [4, 5, 6, 7]. Compared to the traditional methods, deep learning based approaches have a lower dependence on apriori knowledge, and new models can reconstruct images more accurately both in global and local scales. Early networks are applied to replace the single step of traditional methods, e.g., estimating the kernel or deblurring with a fixed and known kernel [4, 8]. More recent works implement end-to-end learning approach to handle space-variant blur and have achieved state-of-the-art performance [5, 6, 7].
There are still some issues of previous deep neural networks architecture for the image deblurring. Firstly, although neural networks using deeper architectures are usually efficient, it is hard to interpret the effect of a single component in these networks. Moreover, the evaluation metrics used in the image restoration tasks, such as peak signal noise ratio (PSNR) and structural similarity index (SSIM), are generally based on pixel-wise or feature-wise differences between the clear natural image and the processed image, tending to enhance the mathematical similarity rather than the human subjective perceptual quality. PSNR measures image quality by calculating the Mean Squared Error (MSE), which still exists a gap with the assessment of the human visual system. SSIM models human visual quality on several components (such as luminance, contrast, and structure). These components can be used to evaluate visual quality, but still inherently one-sided evaluation on the complexity of human vision.
In this paper, we focus on not only the fitting effect but also the perceptual factors to improve the ability of networks. The human visual sensitivity to the various frequencies of visual stimuli is measured by the contrast sensitivity functions, which can be an estimation of human visual perception abilities .  has shown contrast sensitivity functions depend on edge as well as high-frequency structure. Therefore, the reconstruction of edge information in degradated images is the key component to make objects in restored images more recognizable. As a key component of high frequency, edge information should be incorporated to deal with the deblurring task. The clear parts in natural images usually keep their edges sharp and smooth, while blurred regions usually have vignette edges. Based on this observation, we combine this factor into the image deblurring model. The proposed edge-aware deblur network (EADNet) has two phases, i.e., extracting high-frequency edge information and edge-aware deblurring. For each phase, we design a single subnet for the outputs.
The highlight of our work is, our deblurring model separate high-frequency information factors from the end-to-end model, enforcing the network optimized towards specific visual effects, which enhance the interpretability of entire networks. Although objective optimizes some general metrics like PSNR and SSIM is relatively effective for deblurring, they are not totally adopting human perceptual demand. Our model is trained for sharp edges in deblurred images, which is more helpful for human sensing and recognition.
The rest of this paper is organized as follows. Section II introduces the background of image deblurring and edge detection. Section III and IV discuss the network architecture and training of EADNet in detail. In Section V, we demonstrate the qualitative and quantitative study on the benchmarks. We conclude our work in Section VI.
II Related Work
II-A Image Deblurring
Image degradation is a severe problem for most computer vision tasks. Especially considering the tasks like object detection or recognition, degraded images can always be bad input, then lead to poor feedback. Among the situations of image degradation, image blur is hard to settle because of the complex image sampling environment . Motion blur is an inevitable phenomenon under long exposure times. Both of the camera shake and object moving can lead to blurry images. With mobile cameras abundantly available and less usage restriction, blurry sampling results occupy a very big specific position in the real scene. However, it is rather a challenging problem when considering image degradation model. In general, the blurry image can be modeled as: where is the convolution operator, , , , and represented blurry results, original clear natural image, blur kernel, and independent additive noise, respectively. To deal with the degradated images, we must convert the blurry images into sharp and clear results. However, because of the other parameters are unknown, deblurring is an ill-posed problem with an infinite number of solutions. Although sometimes we can assume the type of noise and blur kernel, this problem may still remain ill-posed.
Generally, deblurring methods are different in the hypothesis of parameter blur kernel and divided into blind or non-blind deblurring. Traditional methods mostly have their assumption of a fixed and known blur kernel, called non-blind deblurring . Those non-blind deblurring method always cannot restore images with high quality and adapt to various situations concurrently. And some other algorithms estimate blur kernel K, called blind deblurring [1, 11]. They use image heuristics and blur assumptions to deal with a complex situation that blur functions for each pixel are different. Then some successful and seminal works lead to more general situations after a success work . These methods were designed to suppress artifacts in more complex image degradated situation, including total variation , sparse image priors [13, 14], gradient prior [15, 16].
Recently the deep convolutional neural networks also benefit image deblurring task. Some new methods use deep CNN to replace some certain steps in traditional methods, such as predicting blur direction , predicting deconvolution kernel in frequency domain  or simulate iterative optimization . Some latest methods focus more on end-to-end network architectures, create more and more complex architectures like encoder-decoder , multi-scale , scale-recurrent  and generative adversarial network .
II-B Edge Detection
As a fundamental visual task aiming at detecting edges and object boundaries in natural images, edge detection is of great importance to a variety of computer vision or image processing areas such as segmentation, object detection and recognition, and 3D vision. Early research formulates this task as a low-level or mid-level grouping problem. A variety of mathematical methods were developed, with Gestalt laws and perceptual grouping play considerable roles in algorithm design [1, 21, 22]. These methods aim at identifying points in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. The points at which image brightness changes sharply are typically organized into a set of curved line segments termed edges. Sobel detector  and Canny detector  are typical representatives of these methods. Later works try to solve this problem by analyzing features near boundaries through careful manual design. Some of these methods use information theory on top of these features. And some other methods build data-driven models to learn how to detect edges [25, 26].
Recently, the research was carried on with the advances in deep learning using convolutional neural networks, which has led to a significant breakthrough in other areas of computer vision. These methods based on fully convolution networks (FCNs)  or FCN-like networks, pushing the boundaries of state-of-the-art performance to new levels [28, 29, 30]. Additionally, the thought and method of edge detection also have been extended to wider areas. For instance, precisely localizing edges in natural images involve the multi-level visual perception, which guides many neural network based approaches to start their design with multi-category, multi-scale or multi-level learning [29, 31, 32, 30].
The perceptual quality of restored images is important for evaluating image restoration methods or models. However, it is usually difficult to reach a subjective assessment. First of all, the perceptual quality is defined by human evaluation, while it is a heavy burden for a human to distinguish the high perceptual quality images from the low quality distorted ones in a subjective way. Besides, mainstream objective metrics for deblurring are full-reference, such as peak signal noise ratio (PSNR) and structural similarity index (SSIM), judging restored images by comparing them with original natural ones. Sometimes some learning based methods may reach high scores on these metrics, but their deblurred results maybe not sharp but just similar to original images. Therefore, more perceptual factors should be considered to improve the substantial capability of the networks. This idea motivates us to build our model and try to enforce model deblurring ability interpretably.
Human recognizes objects by high-frequency components in the images, and the edge is a representation of high-frequency information. The goal of our method is to restore blurred images and make deblurred images with more sharpen edges. It is designed to work as a two-phase model. As shown in Figure 2, the EADNet model includes two subnets, namely the EdgeNet and DeblurNet. The EdgeNet is a network served for Phase I by detecting edges from the blurry image. Then the edge mapping will be concatenated to the original blurry image, as an extra input channel of the next phase. In Phase II, the DeblurNet uses this 4-channel image to deblur the fuzzy part and enforce the edges with the aid of the edge mapping, and finally outputs the deblurred images. We will start by introducing the details of both networks in the section, followed by their training strategy in Section IV.
For the edge detection network in the deblurring pipeline, initially, we would like to employ a function or simple module rather than a neural network, such that we can enforce the network capacity without massive extra computation. However, we found that traditional methods like Canny detector  limited by some artificial constraint or threshold and do not have proper adaptation. Then we switch to the network-based models and a trimmed VGG-based network , i.e., the Holistically-nested Edge Detection (HED, ) is chosen as our basic structure. As shown in Figure 3, the side-output layers are inserted after convolutional layers, serving multi-scale and multi-level outputs. At last, one additional weighted-fusion layer will combine outputs from multiple scales.
Note that even the original edge detection network is with very high detection performance, its applications on the blurry images are not as good as the clear images. We observe that these multi-scale side-output layers from EdgeNet have an interesting characteristic: the first side-output layers preserve the detailed and local edges as the last side-output layers (see Figure 4). Inspired by this, we use different strategies during training and processing stage. The whole network is used for training, so as to acquire multi-scale edge detection capacity. While in the testing stage, we use a reduced subnet, which only remains the input layer to the side-output layer 1. This choice makes a trade-off between performance and model efficiency, which keeps enough edge information and requires less computation resource during inference.
As shown in Figure 5, a generative CNN architecture is employed as the DeblurNet. The network consists of three convolution blocks, nine residual blocks, two upsampling convolution blocks, and a global skip-connection convolutional layer. Besides, weight normalization is applied to convolutional layers for easier training.
In the first convolution block, we use a big convolution kernel with kernel size 9 and stride 4 to extract low-level feature mappings with same width and height as original images. Then two convolution blocks work as downsampling blocks, generating half-size feature mappings using small kernel (kernel size=3, stride=2). All these convolution blocks use ReLU for activation after convolutional layers.
For residual blocks, we choose the architecture introduced in  and set a residual block number to 9. In each residual block, channel numbers are expanded by the first convolutional layer using kernel and then apply ReLU. After that, there is an efficient linear low-rank convolution, which is a stack of one convolution reducing channel numbers and one convolution performing spatial-wise feature extraction.
Upsampling convolutional blocks consist of a pixel-shuffle and a convolutional layer. These upsampling blocks enlarge 2-D feature mappings shape (width and height) and compress the channel numbers.
Finally, a global skip-connection structure generates the final output. Two convolutional layers using kernels take in low-level features from first convolution block and high-level features from residual body respectively. Then we apply element-wise summation on two 3-channel outputs before the final Tanh activation layer.
IV Network Training
IV-A Blurry Images and Edge Map Generation
Generally speaking, when training an edge detection network, clear images are employed as input data pairs and edge maps as the ground truth. However, our EdgeNet is designed to extract edges from blurry images. The input training data should be blurry images instead of clear ones. It is hard to find an off-the-shelf dataset offering blurry images with clear edge maps. Therefore, we build the training dataset by generating from clear images in MS COCO dataset  and use them to train both the EdgeNet and DeblurNet.
In order to keep the adaptability and robustness of the network in the complex image scenario, we randomly add the Gaussian blur or motion blur to generate the blurry input images. Specifically, for the motion blur, we choose the method proposed by , which firstly generates a random trajectory vector by the Markov process and then applies sub-pixel interpolation to trajectory vectors to generate the blur kernel.
To generate the ground truth edge map, we choose a classic method, i.e., the Canny edge detector . Comparing to the annotation by the human, the Canny edges are dependent on strong artificial thresholds, usually not directly connected, and with exhibiting spatial shift and inconsistency. However, the Canny detector is less time consuming, and experiments validate that the Canny edges are still useful for training the EdgeNet.
IV-B Phase I: EdgeNet Training
Compared with the original HED model, we make some change during the training process. The modified training framework is shown in Figure 6(a). We introduce a discriminator network to build adversarial training. As discriminator in generative adversarial networks always has strong image distribution fitting abilities, it can also help EdgeNet to learn how to extract blurred edges from blurry images. The architecture of this discriminator is similar to PatchGAN , and all the convolutional blocks have convolutional layer followed by a Spectral Normalization layer  and LeakyReLU . We formulate the loss function as follows, including edge loss term and adversarial loss with trade-off :
The edge loss is calculated based on class-balanced cross-entropy loss mentioned in ,
where and are ground truth edge mapping and input blurry image, and are output edge mappings from side output layer and final fuse layer respectively.
The adversarial loss term is calculated as vanilla GAN , where and represent the network parameters for discriminator and generator (i.e., EdgeNet).
In our implementation, we use the discriminator network as a training accelerator. At the beginning of EdgeNet training, we set a quite small to 0.05, avoiding the overfitting to training data. After 50 epochs, is set to 0, which means the discriminator only used in pretraining processing.
IV-C Phase II: Edge-Aware Training of DeblurNet
The training of DeblurNet is directly connected to that of EdgeNet. Not only the edge map channel of input comes from the EdgeNet, but also the EdgeNet will guide the DeblurNet learn to build sharp edges for deblurred images. As illustrated in Figure 6(b), we have three terms in the loss function.
The first term is the pixel loss, which is the MSE by pixel-wise comparison. The perceptual loss, inspired by , is based on the difference of feature maps from an ImageNet  pre-trained VGG19 network  between the generated and target image. Formally, the perceptual loss is as follows,
where the represents the -th convolutional layer. Both the pixel and perceptual terms are considered as content loss and here we use metric to compute the MSE.
The third term is the edge loss, which is similar to the one for EdgeNet training and also based on the class-balanced cross-entropy loss. The edge maps for calculating loss are extracted from blurry inputs and DeblurNet deblurred outputs respectively. The loss function is defined as follows,
where is the weight for side-output layer . In our experiment, we set and as the tailer version of HED is used.
We evaluate the framework of edge-aware deblur network on two image deblurring benchmarks: GOPRO  and Kohler . GOPRO dataset  consists of 3214 pairs of blurred and sharp images in 720p quality, taken from various scenes (2103 pairs for training and 1111 for testing). The blurry images are generated from clear video images. Kohler dataset  is also a standard benchmark dataset for evaluation of blind deblurring algorithms. The dataset includes 4 clear images, and each of them is blurred with 12 different blur kernels generated with on real camera motion records and analysis. It is played back on a robot platform such that a sequence of sharp images is recorded sampling the 6D camera motion trajectory.
In order to keep the generic deblurring capacity for our model, we use a mixed dataset during the training process. The final representation of mixed dataset has three parts, i.e., clear images, blurry images, and edge images from clear images. And the mixed dataset has two sources, i.e., MS COCO dataset  and GoPro training set . MS COCO dataset only consists of clear sharp images, so we randomly choose 2000 images, using Canny edge detector to extract edge images from clear images and the method mentioned in Section IV-A to generate blurry images. The GoPro training set has 2103 pairs of clear and blurry images, so we only use Canny to obtain clear edge images.
|Kim et al. ||23.64||0.8239||24.68||0.7937||1 hr|
|Sun et al. ||24.64||0.8429||25.22||0.7735||20 min|
|Nah et al. ||29.08||0.9135||26.48||0.8079||2.51 s|
|Tao et al. ||30.10||0.9323||26.80||0.8375||0.67 s|
V-B Model Training
We use Adam  for our model training on both subnets with parameters and . The initial learning rate is 0.0005 and decay to one-tenth every 20 epochs. Limited by the memory, we sample a batch of 4 blurry images and crop 256256 patches randomly for training inputs. In order to save the training time, a co-training method of EdgeNet and DeblurNet is applied to our two-phase deblur model. We first train the EdgeNet with 50 epochs as mentioned in Section IV-B and then train the DeblurNet. We implement our model using the PyTorch deep learning library . The experiments are conducted with Intel Xeon E5 CPU and NVIDIA Titan X GPU.
V-C Comparison with state-of-the-arts
We first compare the EADNet with some previous work or recent state-of-art image deblurring approaches on standard metrics like PSNR, SSIM, and running time. The experiments are running in the same environment and the results are shown in Table I.
We can observe that our model achieve comparable results with the state-of-the-arts in the different metric score, and our approach is much faster. As mentioned earlier, full-reference metrics for image restoration tasks like PSNR and SSIM may not be perfect. Therefore, we also evaluate the visual effects of deblurred images and some visual details comparison are shown in Figure 7 and 8. Here we choose one of the recent works, i.e., Tao et al., which recover blurry images to sharp images using a scale-recurrent network. From these figures, we can find this network eliminate most blurry structure and artifacts. Deblurred images from our models have more sharp and smooth edges. Our model is able to handle Gaussian blur and motion blur at the same time so that the edges in our deblurred images are sharper. The DeblurNet in our model has indeed learned the specific capacity to generate more sharp images than .
|EADNet w/o. EdgeNet||29.53||0.9014||25.97||0.8189|
|EADNet w. EdgeNet||30.78||0.9137||26.61||0.8297|
|EADNet w. Reduced EdgeNet||31.02||0.9123||26.91||0.8325|
V-D Ablation study
In this section, we focus on evaluating the impact of the edge information. We first compare the edge maps obtained by the images and some patch-level results are shown in Figure 9. The pixels brightness in deblurred edge maps is clearly higher than those in blurry edge maps and deblurred edge maps have more clear lines. And the edge maps from deblurred results are similar to these from clear images. This appearance reveals that our network is able to generate deblurred images with sharp and clear edges.
We also conduct a set of experiments without the edge information (including the input edge mapping and edge loss) and the results are illustrated in the first row of Table II. We also test two settings with the edge, i.e., by using the full EdgeNet and using the reduced version (as shown in the left part of Fig. 3). Using only the edge maps from side-output layer 1 (i.e., Reduced EdgeNet), we can already get higher or comparable PSNR/SSIM, while the computational costs are greatly reduced. And the substantial performance gains over the results without edge and the deblur results (as illustrated in Fig. 1) confirm the effectiveness of using EdgeNet as the basic elements for deep image deblurring task.
We explored the real demand for human vision in the deblurring task and validated that edge matters in the deep image deblurring system. A two-phase edge-aware deblur network composed of an edge detection subnet as well as a deblur subnet is proposed. One important goal in this work is to build a novel deblurring model with the capacity of making images with sharp edges; the deblurring results from our model have sharp edges, which make objects in images easy to recognize. We conduct experiments using the EADNet framework on a few benchmark images and demonstrate its superiority in terms of effectiveness and efficiency over previous approaches.
-  R. Szeliski, Computer Vision: Algorithms and Applications. Springer-Verlag New York, Inc., 2010.
-  T. F. Chan and C. K. Wong, Total variation blind deconvolution. IEEE Press, 1998.
-  A. Goldstein and R. Fattal, “Blur-kernel estimation from spectral irregularities,” European Conference on Computer Vision, vol. 7576, no. 1, pp. 622–635, 2012.
-  J. Sun, W. Cao, Z. Xu, and J. Ponce, “Learning a convolutional neural network for non-uniform motion blur removal,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 769–777, 2015.
-  S. Nah, T. H. Kim, and K. M. Lee, “Deep multi-scale convolutional neural network for dynamic scene deblurring,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 257–265, 2017.
-  M. Noroozi, P. Chandramouli, and P. Favaro, “Motion deblurring in the wild,” in German Conference on Pattern Recognition, 2017, pp. 65–77.
-  X. Tao, H. Gao, Y. Wang, X. Shen, J. Wang, and J. Jia, “Scale-recurrent network for deep image deblurring,” IEEE Conference on Computer Vision and Pattern Recognition, 2018.
-  D. Gong, J. Yang, L. Liu, Y. Zhang, I. Reid, C. Shen, A. V. D. Hengel, and Q. Shi, “From motion blur to motion flow: A deep learning solution for removing heterogeneous motion blur,” IEEE Conference on Computer Vision and Pattern Recognition, 2017.
-  S. E. Palmer, “Vision science: Photons to phenomenology,” Quarterly Review of Biology, vol. 77, no. 4, pp. 233–234, 1999.
-  P. J. Bex, S. G. Solomon, and S. C. Dakin, “Contrast sensitivity in natural scenes depends on edge as well as spatial frequency structure,” Journal of Vision, vol. 9, no. 10, 2009.
-  Y. Xin, X. Feng, S. Zhang, and Z. Li, “Efficient patch-wise non-uniform deblurring for a single image,” IEEE Transactions on Multimedia, vol. 16, no. 6, pp. 1510–1524, 2014.
-  R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, and W. T. Freeman, “Removing camera shake from a single photograph.” ACM Transactions on Graphics, vol. 25, no. 3, pp. 787–794, 2006.
-  A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Understanding and evaluating blind deconvolution algorithms,” IEEE Conference on Computer Vision and Pattern Recognition, vol. 8, no. 1, pp. 1964–1971, 2009.
-  L. Li, D. Wu, J. Wu, H. Li, W. Lin, and A. C. Kot, “Image sharpness assessment by sparse representation,” IEEE Transactions on Multimedia, vol. 18, no. 6, pp. 1085–1097, June 2016.
-  L. Xu, S. Zheng, and J. Jia, “Unnatural l0 sparse representation for natural image deblurring,” IEEE Conference on Computer Vision and Pattern Recognition, vol. 9, no. 4, pp. 1107–1114, 2013.
-  Y. Zhan and R. Zhang, “No-reference image sharpness assessment based on maximum gradient and variability of gradients,” IEEE Transactions on Multimedia, vol. 20, pp. 1796–1808, 2017.
-  A. Chakrabarti, “A neural approach to blind motion deblurring,” European Conference on Computer Vision, pp. 221–235, 2016.
-  C. Schuler, M. Hirsch, S. Harmeling, and B. Scholkopf, “Learning to deblur.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 7, pp. 1439–1451, 2016.
-  S. Su, M. Delbracio, J. Wang, G. Sapiro, W. Heidrich, and O. Wang, “Deep video deblurring,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 1279–1288, 2017.
-  O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas, “Deblurgan: Blind motion deblurring using conditional adversarial networks,” in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8183–8192.
-  D. Perrone and P. Favaro, “Total variation blind deconvolution: The devil is in the details,” in IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2909–2916.
-  J. Pan, Z. Hu, Z. Su, and M. H. Yang, “Deblurring text images via l0-regularized intensity and gradient prior,” in IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2901–2908.
-  J. Kittler, “On the accuracy of the sobel edge detector,” Image & Vision Computing, vol. 1, no. 1, pp. 37–42, 1983.
-  J. Canny, “A computational approach to edge detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 6, pp. 679–698, 1986.
-  S. Konishi, A. L. Yuille, J. M. Coughlan, and S. C. Zhu, “Statistical edge detection: learning and evaluating edge cues,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 1, pp. 57–74, 2003.
-  J. J. Lim, C. L. Zitnick, and P. Dollar, “Sketch tokens: A learned mid-level representation for contour and object detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3158–3165.
-  J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431–3440.
-  I. Kokkinos, “Pushing the boundaries of boundary detection using deep learning,” arXiv:1511.07386, 2015.
-  S. Xie and Z. Tu, “Holistically-nested edge detection,” International Journal of Computer Vision, vol. 125, no. 1-3, pp. 1–16, 2015.
-  Z. Yu, C. Feng, M. Y. Liu, and S. Ramalingam, “Casenet: Deep category-aware semantic edge detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1761–1770.
-  G. Bertasius, J. Shi, and L. Torresani, “Deepedge: A multi-scale bifurcated deep network for top-down contour detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 4380–4389.
-  N. Neverova, C. Wolf, G. W. Taylor, and F. Nebout, “Multi-scale deep learning for gesture detection and localization,” in ECCV Workshop on Looking at People, 2014, pp. 474–490.
-  K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations, 2015.
-  J. Yu, Y. Fan, J. Yang, N. Xu, Z. Wang, X. Wang, and T. Huang, “Wide activation for efficient and accurate image super-resolution,” 2018.
-  T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European Conference on Computer Vision, 2014, pp. 740–755.
-  G. Boracchi and A. Foi, “Modeling the performance of image restoration from motion blur,” IEEE Transactions on Image Processing, vol. 21, no. 8, pp. 3502–3517, 2012.
-  P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 5967–5976, 2016.
-  T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” International Conference on Learning Representations, 2018.
-  B. Xu, N. Wang, T. Chen, and M. Li, “Empirical evaluation of rectified activations in convolutional network,” arXiv:1505.00853, 2015.
-  I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, 2014, pp. 2672–2680.
-  J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European Conference on Computer Vision, 2016, pp. 694–711.
-  O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
-  S. Nah, T. H. Kim, and K. M. Lee, “Deep multi-scale convolutional neural network for dynamic scene deblurring,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017.
-  R. Köhler, M. Hirsch, B. Mohler, B. Schölkopf, and S. Harmeling, “Recording and playback of camera shake: Benchmarking blind deconvolution with a real-world database,” in European Conference on Computer Vision, 2012.
-  T. H. Kim, B. Ahn, and K. M. Lee, “Dynamic scene deblurring,” International Conference on Computer Vision, pp. 3160–3167, 2013.
-  D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations, 2015.
-  A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS Workshop, 2017.