Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kyoung Mu Lee

Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image

Aug 17, 2019

Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee

Figure 1 for Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image

Figure 2 for Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image

Figure 3 for Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image

Figure 4 for Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image

Abstract:Although significant improvement has been achieved recently in 3D human pose estimation, most of the previous methods only treat a single-person case. In this work, we firstly propose a fully learning-based, camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image. The pipeline of the proposed system consists of human detection, absolute 3D human root localization, and root-relative 3D single-person pose estimation modules. Our system achieves comparable results with the state-of-the-art 3D single-person pose estimation models without any groundtruth information and significantly outperforms previous 3D multi-person pose estimation methods on publicly available datasets. The code is available in https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE , https://github.com/mks0601/3DMPPE_POSENET_RELEASE.

* Published at ICCV 2019

Via

Access Paper or Ask Questions

Continual Learning by Asymmetric Loss Approximation with Single-Side Overestimation

Aug 08, 2019

Dongmin Park, Seokil Hong, Bohyung Han, Kyoung Mu Lee

Figure 1 for Continual Learning by Asymmetric Loss Approximation with Single-Side Overestimation

Figure 2 for Continual Learning by Asymmetric Loss Approximation with Single-Side Overestimation

Figure 3 for Continual Learning by Asymmetric Loss Approximation with Single-Side Overestimation

Figure 4 for Continual Learning by Asymmetric Loss Approximation with Single-Side Overestimation

Abstract:Catastrophic forgetting is a critical challenge in training deep neural networks. Although continual learning has been investigated as a countermeasure to the problem, it often suffers from requirements of additional network components and weak scalability to a large number of tasks. We propose a novel approach to continual learning by approximating a true loss function based on an asymmetric quadratic function with one of its sides overestimated. Our algorithm is motivated by the empirical observation that updates of network parameters affect target loss functions asymmetrically. In the proposed continual learning framework, we estimate an asymmetric loss function for the tasks considered in the past through a proper overestimation of its unobserved side in training new tasks, while deriving the accurate model parameter for the observed side. In contrast to existing approaches, our method is free from side effects and achieves the state-of-the-art results that are even close to the upper-bound performance on several challenging benchmark datasets.

Via

Access Paper or Ask Questions

Learning to Forget for Meta-Learning

Jun 13, 2019

Sungyong Baik, Seokil Hong, Kyoung Mu Lee

Figure 1 for Learning to Forget for Meta-Learning

Figure 2 for Learning to Forget for Meta-Learning

Figure 3 for Learning to Forget for Meta-Learning

Figure 4 for Learning to Forget for Meta-Learning

Abstract:Few-shot learning is a challenging problem where the system is required to achieve generalization from only few examples. Meta-learning tackles the problem by learning prior knowledge shared across a distribution of tasks, which is then used to quickly adapt to unseen tasks. Model-agnostic meta-learning (MAML) algorithm formulates prior knowledge as a common initialization across tasks. However, forcibly sharing an initialization brings about conflicts between tasks and thus compromises the quality of the initialization. In this work, by observing that the extent of compromise differs among tasks and between layers of a neural network, we propose a new initialization idea that employs task-dependent layer-wise attenuation, which we call selective forgetting. The proposed attenuation scheme dynamically controls how much of prior knowledge each layer will exploit for a given task. The experimental results demonstrate that the proposed method mitigates the conflicts and provides outstanding performance as a result. We further show that the proposed method, named L2F, can be applied and improve other state-of-the-art MAML-based frameworks, illustrating its generalizability.

* Under Review

Via

Access Paper or Ask Questions

Multi-scale Aggregation R-CNN for 2D Multi-person Pose Estimation

May 10, 2019

Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee

Figure 1 for Multi-scale Aggregation R-CNN for 2D Multi-person Pose Estimation

Figure 2 for Multi-scale Aggregation R-CNN for 2D Multi-person Pose Estimation

Figure 3 for Multi-scale Aggregation R-CNN for 2D Multi-person Pose Estimation

Figure 4 for Multi-scale Aggregation R-CNN for 2D Multi-person Pose Estimation

Abstract:Multi-person pose estimation from a 2D image is challenging because it requires not only keypoint localization but also human detection. In state-of-the-art top-down methods, multi-scale information is a crucial factor for the accurate pose estimation because it contains both of local information around the keypoints and global information of the entire person. Although multi-scale information allows these methods to achieve the state-of-the-art performance, the top-down methods still require a huge amount of computation because they need to use an additional human detector to feed the cropped human image to their pose estimation model. To effectively utilize multi-scale information with the smaller computation, we propose a multi-scale aggregation R-CNN (MSA R-CNN). It consists of multi-scale RoIAlign block (MS-RoIAlign) and multi-scale keypoint head network (MS-KpsNet) which are designed to effectively utilize multi-scale information. Also, in contrast to previous top-down methods, the MSA R-CNN performs human detection and keypoint localization in a single model, which results in reduced computation. The proposed model achieved the best performance among single model-based methods and its results are comparable to those of separated model-based methods with a smaller amount of computation on the publicly available 2D multi-person keypoint localization dataset.

* Published at CVPRW 2019

Via

Access Paper or Ask Questions

PoseFix: Model-agnostic General Human Pose Refinement Network

Dec 10, 2018

Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee

Figure 1 for PoseFix: Model-agnostic General Human Pose Refinement Network

Figure 2 for PoseFix: Model-agnostic General Human Pose Refinement Network

Figure 3 for PoseFix: Model-agnostic General Human Pose Refinement Network

Figure 4 for PoseFix: Model-agnostic General Human Pose Refinement Network

Abstract:Multi-person pose estimation from a 2D image is an essential technique for human behavior understanding. In this paper, we propose a human pose refinement network that estimates a refined pose from a tuple of an input image and input pose. The pose refinement was performed mainly through an end-to-end trainable multi-stage architecture in previous methods. However, they are highly dependent on pose estimation models and require careful model design. By contrast, we propose a model-agnostic pose refinement method. According to a recent study, state-of-the-art 2D human pose estimation methods have similar error distributions. We use this error statistics as prior information to generate synthetic poses and use the synthesized poses to train our model. In the testing stage, pose estimation results of any other methods can be input to the proposed method. Moreover, the proposed model does not require code or knowledge about other methods, which allows it to be easily used in the post-processing step. We show that the proposed approach achieves better performance than the conventional multi-stage refinement models and consistently improves the performance of various state-of-the-art pose estimation methods on the commonly used benchmark. We will release the code and pre-trained model for easy access.

Via

Access Paper or Ask Questions

SPNet: Deep 3D Object Classification and Retrieval using Stereographic Projection

Nov 05, 2018

Mohsen Yavartanoo, Eu Young Kim, Kyoung Mu Lee

Figure 1 for SPNet: Deep 3D Object Classification and Retrieval using Stereographic Projection

Figure 2 for SPNet: Deep 3D Object Classification and Retrieval using Stereographic Projection

Figure 3 for SPNet: Deep 3D Object Classification and Retrieval using Stereographic Projection

Figure 4 for SPNet: Deep 3D Object Classification and Retrieval using Stereographic Projection

Abstract:We propose an efficient Stereographic Projection Neural Network (SPNet) for learning representations of 3D objects. We first transform a 3D input volume into a 2D planar image using stereographic projection. We then present a shallow 2D convolutional neural network (CNN) to estimate the object category followed by view ensemble, which combines the responses from multiple views of the object to further enhance the predictions. Specifically, the proposed approach consists of four stages: (1) Stereographic projection of a 3D object, (2) view-specific feature learning, (3) view selection and (4) view ensemble. The proposed approach performs comparably to the state-of-the-art methods while having substantially lower GPU memory as well as network parameters. Despite its lightness, the experiments on 3D object classification and shape retrievals demonstrate the high performance of the proposed method.

Via

Access Paper or Ask Questions

Real-time visual tracking by deep reinforced decision making

Aug 17, 2018

Janghoon Choi, Junseok Kwon, Kyoung Mu Lee

Figure 1 for Real-time visual tracking by deep reinforced decision making

Figure 2 for Real-time visual tracking by deep reinforced decision making

Figure 3 for Real-time visual tracking by deep reinforced decision making

Figure 4 for Real-time visual tracking by deep reinforced decision making

Abstract:One of the major challenges of model-free visual tracking problem has been the difficulty originating from the unpredictable and drastic changes in the appearance of objects we target to track. Existing methods tackle this problem by updating the appearance model on-line in order to adapt to the changes in the appearance. Despite the success of these methods however, inaccurate and erroneous updates of the appearance model result in a tracker drift. In this paper, we introduce a novel real-time visual tracking algorithm based on a template selection strategy constructed by deep reinforcement learning methods. The tracking algorithm utilizes this strategy to choose the appropriate template for tracking a given frame. The template selection strategy is self-learned by utilizing a simple policy gradient method on numerous training episodes randomly generated from a tracking benchmark dataset. Our proposed reinforcement learning framework is generally applicable to other confidence map based tracking algorithms. The experiment shows that our tracking algorithm runs in real-time speed of 43 fps and the proposed policy network effectively decides the appropriate template for successful visual tracking.

Via

Access Paper or Ask Questions

V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map

Aug 16, 2018

Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee

Figure 1 for V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map

Figure 2 for V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map

Figure 3 for V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map

Figure 4 for V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map

Abstract:Most of the existing deep learning-based methods for 3D hand and human pose estimation from a single depth map are based on a common framework that takes a 2D depth map and directly regresses the 3D coordinates of keypoints, such as hand or human body joints, via 2D convolutional neural networks (CNNs). The first weakness of this approach is the presence of perspective distortion in the 2D depth map. While the depth map is intrinsically 3D data, many previous methods treat depth maps as 2D images that can distort the shape of the actual object through projection from 3D to 2D space. This compels the network to perform perspective distortion-invariant estimation. The second weakness of the conventional approach is that directly regressing 3D coordinates from a 2D image is a highly non-linear mapping, which causes difficulty in the learning procedure. To overcome these weaknesses, we firstly cast the 3D hand and human pose estimation problem from a single depth map into a voxel-to-voxel prediction that uses a 3D voxelized grid and estimates the per-voxel likelihood for each keypoint. We design our model as a 3D CNN that provides accurate estimates while running in real-time. Our system outperforms previous methods in almost all publicly available 3D hand and human pose estimation datasets and placed first in the HANDS 2017 frame-based 3D hand pose estimation challenge. The code is available in https://github.com/mks0601/V2V-PoseNet_RELEASE.

* HANDS 2017 Challenge Frame-based 3D Hand Pose Estimation Winner (ICCV 2017), Published at CVPR 2018

Via

Access Paper or Ask Questions

Joint Blind Motion Deblurring and Depth Estimation of Light Field

Jun 14, 2018

Dongwoo Lee, Haesol Park, In Kyu Park, Kyoung Mu Lee

Figure 1 for Joint Blind Motion Deblurring and Depth Estimation of Light Field

Figure 2 for Joint Blind Motion Deblurring and Depth Estimation of Light Field

Figure 3 for Joint Blind Motion Deblurring and Depth Estimation of Light Field

Figure 4 for Joint Blind Motion Deblurring and Depth Estimation of Light Field

Abstract:Removing camera motion blur from a single light field is a challenging task since it is highly ill-posed inverse problem. The problem becomes even worse when blur kernel varies spatially due to scene depth variation and high-order camera motion. In this paper, we propose a novel algorithm to estimate all blur model variables jointly, including latent sub-aperture image, camera motion, and scene depth from the blurred 4D light field. Exploiting multi-view nature of a light field relieves the inverse property of the optimization by utilizing strong depth cues and multi-view blur observation. The proposed joint estimation achieves high quality light field deblurring and depth estimation simultaneously under arbitrary 6-DOF camera motion and unconstrained scene depth. Intensive experiment on real and synthetic blurred light field confirms that the proposed algorithm outperforms the state-of-the-art light field deblurring and depth estimation methods.

Via

Access Paper or Ask Questions

Deep Vessel Segmentation By Learning Graphical Connectivity

Jun 06, 2018

Seung Yeon Shin, Soochahn Lee, Il Dong Yun, Kyoung Mu Lee

Figure 1 for Deep Vessel Segmentation By Learning Graphical Connectivity

Figure 2 for Deep Vessel Segmentation By Learning Graphical Connectivity

Figure 3 for Deep Vessel Segmentation By Learning Graphical Connectivity

Figure 4 for Deep Vessel Segmentation By Learning Graphical Connectivity

Abstract:We propose a novel deep-learning-based system for vessel segmentation. Existing methods using CNNs have mostly relied on local appearances learned on the regular image grid, without considering the graphical structure of vessel shape. To address this, we incorporate a graph convolutional network into a unified CNN architecture, where the final segmentation is inferred by combining the different types of features. The proposed method can be applied to expand any type of CNN-based vessel segmentation method to enhance the performance. Experiments show that the proposed method outperforms the current state-of-the-art methods on two retinal image datasets as well as a coronary artery X-ray angiography dataset.

Via

Access Paper or Ask Questions