Alert button
Picture for Kyoung Mu Lee

Kyoung Mu Lee

Alert button

Deep Meta Learning for Real-Time Visual Tracking based on Target-Specific Feature Space

Dec 26, 2017
Janghoon Choi, Junseok Kwon, Kyoung Mu Lee

Figure 1 for Deep Meta Learning for Real-Time Visual Tracking based on Target-Specific Feature Space
Figure 2 for Deep Meta Learning for Real-Time Visual Tracking based on Target-Specific Feature Space
Figure 3 for Deep Meta Learning for Real-Time Visual Tracking based on Target-Specific Feature Space
Figure 4 for Deep Meta Learning for Real-Time Visual Tracking based on Target-Specific Feature Space

In this paper, we propose a novel on-line visual tracking framework based on Siamese matching network and meta-learner network which runs at real-time speed. Conventional deep convolutional feature based discriminative visual tracking algorithms require continuous re-training of classifiers or correlation filters for solving complex optimization tasks to adapt to the new appearance of a target object. To remove this process, our proposed algorithm incorporates and utilizes a meta-learner network to provide the matching network with new appearance information of the target object by adding the target-aware feature space. The parameters for the target-specific feature space are provided instantly from a single forward-pass of the meta-learner network. By eliminating the necessity of continuously solving the complex optimization tasks in the course of tracking, experimental results demonstrate that our algorithm performs at a real-time speed of $62$ fps while maintaining a competitive performance among other state-of-the-art tracking algorithms.

Viaarxiv icon

Joint Weakly and Semi-Supervised Deep Learning for Localization and Classification of Masses in Breast Ultrasound Images

Oct 10, 2017
Seung Yeon Shin, Soochahn Lee, Il Dong Yun, Kyoung Mu Lee

Figure 1 for Joint Weakly and Semi-Supervised Deep Learning for Localization and Classification of Masses in Breast Ultrasound Images
Figure 2 for Joint Weakly and Semi-Supervised Deep Learning for Localization and Classification of Masses in Breast Ultrasound Images
Figure 3 for Joint Weakly and Semi-Supervised Deep Learning for Localization and Classification of Masses in Breast Ultrasound Images
Figure 4 for Joint Weakly and Semi-Supervised Deep Learning for Localization and Classification of Masses in Breast Ultrasound Images

We propose a framework for localization and classification of masses in breast ultrasound (BUS) images. In particular, we simultaneously use a weakly annotated dataset and a relatively small strongly annotated dataset to train a convolutional neural network detector. We have experimentally found that mass detectors trained with small, strongly annotated datasets are easily overfitted, whereas those trained with large, weakly annotated datasets present a non-trivial problem. To overcome these problems, we jointly use datasets with different characteristics in a hybrid manner. Consequently, a sophisticated weakly and semi-supervised training scenario is introduced with appropriate training loss selection. Experimental results show that the proposed method successfully localizes and classifies masses while requiring less effort in annotation work. The influences of each component in the proposed framework are also validated by conducting an ablative analysis. Although the proposed method is intended for masses in BUS images, it can also be applied as a general framework to train computer-aided detection and diagnosis systems for a wide variety of image modalities, target organs, and diseases.

Viaarxiv icon

Look Wider to Match Image Patches with Convolutional Neural Networks

Sep 19, 2017
Haesol Park, Kyoung Mu Lee

Figure 1 for Look Wider to Match Image Patches with Convolutional Neural Networks
Figure 2 for Look Wider to Match Image Patches with Convolutional Neural Networks
Figure 3 for Look Wider to Match Image Patches with Convolutional Neural Networks
Figure 4 for Look Wider to Match Image Patches with Convolutional Neural Networks

When a human matches two images, the viewer has a natural tendency to view the wide area around the target pixel to obtain clues of right correspondence. However, designing a matching cost function that works on a large window in the same way is difficult. The cost function is typically not intelligent enough to discard the information irrelevant to the target pixel, resulting in undesirable artifacts. In this paper, we propose a novel learn a stereo matching cost with a large-sized window. Unlike conventional pooling layers with strides, the proposed per-pixel pyramid-pooling layer can cover a large area without a loss of resolution and detail. Therefore, the learned matching cost function can successfully utilize the information from a large area without introducing the fattening effect. The proposed method is robust despite the presence of weak textures, depth discontinuity, illumination, and exposure difference. The proposed method achieves near-peak performance on the Middlebury benchmark.

* H. Park and K. M. Lee, "Look Wider to Match Image Patches with Convolutional Neural Networks," in IEEE Signal Processing Letters, vol. PP, no. 99, pp. 1-1, 2016  
* published in SPL 
Viaarxiv icon

Joint Estimation of Camera Pose, Depth, Deblurring, and Super-Resolution from a Blurred Image Sequence

Sep 18, 2017
Haesol Park, Kyoung Mu Lee

Figure 1 for Joint Estimation of Camera Pose, Depth, Deblurring, and Super-Resolution from a Blurred Image Sequence
Figure 2 for Joint Estimation of Camera Pose, Depth, Deblurring, and Super-Resolution from a Blurred Image Sequence
Figure 3 for Joint Estimation of Camera Pose, Depth, Deblurring, and Super-Resolution from a Blurred Image Sequence
Figure 4 for Joint Estimation of Camera Pose, Depth, Deblurring, and Super-Resolution from a Blurred Image Sequence

The conventional methods for estimating camera poses and scene structures from severely blurry or low resolution images often result in failure. The off-the-shelf deblurring or super-resolution methods may show visually pleasing results. However, applying each technique independently before matching is generally unprofitable because this naive series of procedures ignores the consistency between images. In this paper, we propose a pioneering unified framework that solves four problems simultaneously, namely, dense depth reconstruction, camera pose estimation, super-resolution, and deblurring. By reflecting a physical imaging process, we formulate a cost minimization problem and solve it using an alternating optimization technique. The experimental results on both synthetic and real videos show high-quality depth maps derived from severely degraded images that contrast the failures of naive multi-view stereo methods. Our proposed method also produces outstanding deblurred and super-resolved images unlike the independent application or combination of conventional video deblurring, super-resolution methods.

* accepted to ICCV 2017 
Viaarxiv icon

Enhanced Deep Residual Networks for Single Image Super-Resolution

Jul 10, 2017
Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, Kyoung Mu Lee

Figure 1 for Enhanced Deep Residual Networks for Single Image Super-Resolution
Figure 2 for Enhanced Deep Residual Networks for Single Image Super-Resolution
Figure 3 for Enhanced Deep Residual Networks for Single Image Super-Resolution
Figure 4 for Enhanced Deep Residual Networks for Single Image Super-Resolution

Recent research on super-resolution has progressed with the development of deep convolutional neural networks (DCNN). In particular, residual learning techniques exhibit improved performance. In this paper, we develop an enhanced deep super-resolution network (EDSR) with performance exceeding those of current state-of-the-art SR methods. The significant performance improvement of our model is due to optimization by removing unnecessary modules in conventional residual networks. The performance is further improved by expanding the model size while we stabilize the training procedure. We also propose a new multi-scale deep super-resolution system (MDSR) and training method, which can reconstruct high-resolution images of different upscaling factors in a single model. The proposed methods show superior performance over the state-of-the-art methods on benchmark datasets and prove its excellence by winning the NTIRE2017 Super-Resolution Challenge.

* To appear in CVPR 2017 workshop. Best paper award of the NTIRE2017 workshop, and the winners of the NTIRE2017 Challenge on Single Image Super-Resolution 
Viaarxiv icon

Holistic Planimetric prediction to Local Volumetric prediction for 3D Human Pose Estimation

Jul 08, 2017
Gyeongsik Moon, Ju Yong Chang, Yumin Suh, Kyoung Mu Lee

Figure 1 for Holistic Planimetric prediction to Local Volumetric prediction for 3D Human Pose Estimation
Figure 2 for Holistic Planimetric prediction to Local Volumetric prediction for 3D Human Pose Estimation
Figure 3 for Holistic Planimetric prediction to Local Volumetric prediction for 3D Human Pose Estimation
Figure 4 for Holistic Planimetric prediction to Local Volumetric prediction for 3D Human Pose Estimation

We propose a novel approach to 3D human pose estimation from a single depth map. Recently, convolutional neural network (CNN) has become a powerful paradigm in computer vision. Many of computer vision tasks have benefited from CNNs, however, the conventional approach to directly regress 3D body joint locations from an image does not yield a noticeably improved performance. In contrast, we formulate the problem as estimating per-voxel likelihood of key body joints from a 3D occupancy grid. We argue that learning a mapping from volumetric input to volumetric output with 3D convolution consistently improves the accuracy when compared to learning a regression from depth map to 3D joint coordinates. We propose a two-stage approach to reduce the computational overhead caused by volumetric representation and 3D convolution: Holistic 2D prediction and Local 3D prediction. In the first stage, Planimetric Network (P-Net) estimates per-pixel likelihood for each body joint in the holistic 2D space. In the second stage, Volumetric Network (V-Net) estimates the per-voxel likelihood of each body joints in the local 3D space around the 2D estimations of the first stage, effectively reducing the computational cost. Our model outperforms existing methods by a large margin in publicly available datasets.

Viaarxiv icon

Online Video Deblurring via Dynamic Temporal Blending Network

Apr 11, 2017
Tae Hyun Kim, Kyoung Mu Lee, Bernhard Schölkopf, Michael Hirsch

Figure 1 for Online Video Deblurring via Dynamic Temporal Blending Network
Figure 2 for Online Video Deblurring via Dynamic Temporal Blending Network
Figure 3 for Online Video Deblurring via Dynamic Temporal Blending Network
Figure 4 for Online Video Deblurring via Dynamic Temporal Blending Network

State-of-the-art video deblurring methods are capable of removing non-uniform blur caused by unwanted camera shake and/or object motion in dynamic scenes. However, most existing methods are based on batch processing and thus need access to all recorded frames, rendering them computationally demanding and time consuming and thus limiting their practical use. In contrast, we propose an online (sequential) video deblurring method based on a spatio-temporal recurrent network that allows for real-time performance. In particular, we introduce a novel architecture which extends the receptive field while keeping the overall size of the network small to enable fast execution. In doing so, our network is able to remove even large blur caused by strong camera shake and/or fast moving objects. Furthermore, we propose a novel network layer that enforces temporal consistency between consecutive frames by dynamic temporal blending which compares and adaptively (at test time) shares features obtained at different time steps. We show the superiority of the proposed method in an extensive experimental evaluation.

* 10 pages 
Viaarxiv icon

Occlusion-Aware Video Deblurring with a New Layered Blur Model

Nov 29, 2016
Byeongjoo Ahn, Tae Hyun Kim, Wonsik Kim, Kyoung Mu Lee

Figure 1 for Occlusion-Aware Video Deblurring with a New Layered Blur Model
Figure 2 for Occlusion-Aware Video Deblurring with a New Layered Blur Model
Figure 3 for Occlusion-Aware Video Deblurring with a New Layered Blur Model
Figure 4 for Occlusion-Aware Video Deblurring with a New Layered Blur Model

We present a deblurring method for scenes with occluding objects using a carefully designed layered blur model. Layered blur model is frequently used in the motion deblurring problem to handle locally varying blurs, which is caused by object motions or depth variations in a scene. However, conventional models have a limitation in representing the layer interactions occurring at occlusion boundaries. In this paper, we address this limitation in both theoretical and experimental ways, and propose a new layered blur model reflecting actual blur generation process. Based on this model, we develop an occlusion-aware deblurring method that can estimate not only the clear foreground and background, but also the object motion more accurately. We also provide a novel analysis on the blur kernel at object boundaries, which shows the distinctive characteristics of the blur kernel that cannot be captured by conventional blur models. Experimental results on synthetic and real blurred videos demonstrate that the proposed method yields superior results, especially at object boundaries.

Viaarxiv icon

Deeply-Recursive Convolutional Network for Image Super-Resolution

Nov 11, 2016
Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee

Figure 1 for Deeply-Recursive Convolutional Network for Image Super-Resolution
Figure 2 for Deeply-Recursive Convolutional Network for Image Super-Resolution
Figure 3 for Deeply-Recursive Convolutional Network for Image Super-Resolution
Figure 4 for Deeply-Recursive Convolutional Network for Image Super-Resolution

We propose an image super-resolution method (SR) using a deeply-recursive convolutional network (DRCN). Our network has a very deep recursive layer (up to 16 recursions). Increasing recursion depth can improve performance without introducing new parameters for additional convolutions. Albeit advantages, learning a DRCN is very hard with a standard gradient descent method due to exploding/vanishing gradients. To ease the difficulty of training, we propose two extensions: recursive-supervision and skip-connection. Our method outperforms previous methods by a large margin.

* CVPR 2016 Oral 
Viaarxiv icon

Accurate Image Super-Resolution Using Very Deep Convolutional Networks

Nov 11, 2016
Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee

Figure 1 for Accurate Image Super-Resolution Using Very Deep Convolutional Networks
Figure 2 for Accurate Image Super-Resolution Using Very Deep Convolutional Networks
Figure 3 for Accurate Image Super-Resolution Using Very Deep Convolutional Networks
Figure 4 for Accurate Image Super-Resolution Using Very Deep Convolutional Networks

We present a highly accurate single-image super-resolution (SR) method. Our method uses a very deep convolutional network inspired by VGG-net used for ImageNet classification \cite{simonyan2015very}. We find increasing our network depth shows a significant improvement in accuracy. Our final model uses 20 weight layers. By cascading small filters many times in a deep network structure, contextual information over large image regions is exploited in an efficient way. With very deep networks, however, convergence speed becomes a critical issue during training. We propose a simple yet effective training procedure. We learn residuals only and use extremely high learning rates ($10^4$ times higher than SRCNN \cite{dong2015image}) enabled by adjustable gradient clipping. Our proposed method performs better than existing methods in accuracy and visual improvements in our results are easily noticeable.

* CVPR 2016 Oral 
Viaarxiv icon