Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tae-Kyun Kim

RGB-based 3D Hand Pose Estimation via Privileged Learning with Depth Images

Nov 18, 2018

Shanxin Yuan, Bjorn Stenger, Tae-Kyun Kim

Figure 1 for RGB-based 3D Hand Pose Estimation via Privileged Learning with Depth Images

Figure 2 for RGB-based 3D Hand Pose Estimation via Privileged Learning with Depth Images

Figure 3 for RGB-based 3D Hand Pose Estimation via Privileged Learning with Depth Images

Figure 4 for RGB-based 3D Hand Pose Estimation via Privileged Learning with Depth Images

Abstract:This paper proposes a method for hand pose estimation from RGB images that uses both external large-scale depth image datasets and paired depth and RGB images as privileged information at training time. We show that providing depth information during training significantly improves performance of pose estimation from RGB images during testing. We explore different ways of using this privileged information: (1) using depth data to initially train a depth-based network, (2) using the features from the depth-based network of the paired depth images to constrain mid-level RGB network weights, and (3) using the foreground mask, obtained from the depth data, to suppress the responses from the background area. By using paired RGB and depth images, we are able to supervise the RGB-based network to learn middle layer features that mimic that of the corresponding depth-based network, which is trained on large-scale, accurately annotated depth data. During testing, when only an RGB image is available, our method produces accurate 3D hand pose predictions. Our method is also tested on 2D hand pose estimation. Experiments on three public datasets show that the method outperforms the state-of-the-art methods for hand pose estimation using RGB image input.

Via

Access Paper or Ask Questions

HANDS18: Methods, Techniques and Applications for Hand Observation

Oct 25, 2018

Iason Oikonomidis, Guillermo Garcia-Hernando, Angela Yao, Antonis Argyros, Vincent Lepetit, Tae-Kyun Kim

Figure 1 for HANDS18: Methods, Techniques and Applications for Hand Observation

Figure 2 for HANDS18: Methods, Techniques and Applications for Hand Observation

Abstract:This report outlines the proceedings of the Fourth International Workshop on Observing and Understanding Hands in Action (HANDS 2018). The fourth instantiation of this workshop attracted significant interest from both academia and the industry. The program of the workshop included regular papers that are published as the workshop's proceedings, extended abstracts, invited posters, and invited talks. Topics of the submitted works and invited talks and posters included novel methods for hand pose estimation from RGB, depth, or skeletal data, datasets for special cases and real-world applications, and techniques for hand motion re-targeting and hand gesture recognition. The invited speakers are leaders in their respective areas of specialization, coming from both industry and academia. The main conclusions that can be drawn are the turn of the community towards RGB data and the maturation of some methods and techniques, which in turn has led to increasing interest for real-world applications.

* 11 pages, 1 figure, Discussion of the HANDS 2018 workshop held in conjunction with ECCV 2018

Via

Access Paper or Ask Questions

A Summary of the 4th International Workshop on Recovering 6D Object Pose

Oct 09, 2018

Tomas Hodan, Rigas Kouskouridas, Tae-Kyun Kim, Federico Tombari, Kostas Bekris, Bertram Drost, Thibault Groueix, Krzysztof Walas, Vincent Lepetit, Ales Leonardis(+5 more)

Figure 1 for A Summary of the 4th International Workshop on Recovering 6D Object Pose

Abstract:This document summarizes the 4th International Workshop on Recovering 6D Object Pose which was organized in conjunction with ECCV 2018 in Munich. The workshop featured four invited talks, oral and poster presentations of accepted workshop papers, and an introduction of the BOP benchmark for 6D object pose estimation. The workshop was attended by 100+ people working on relevant topics in both academia and industry who shared up-to-date advances and discussed open problems.

* In: Computer Vision - ECCV 2018 Workshops - Munich, Germany, September 8-9 and 14, 2018, Proceedings

Via

Access Paper or Ask Questions

Task-Oriented Hand Motion Retargeting for Dexterous Manipulation Imitation

Oct 03, 2018

Dafni Antotsiou, Guillermo Garcia-Hernando, Tae-Kyun Kim

Figure 1 for Task-Oriented Hand Motion Retargeting for Dexterous Manipulation Imitation

Figure 2 for Task-Oriented Hand Motion Retargeting for Dexterous Manipulation Imitation

Figure 3 for Task-Oriented Hand Motion Retargeting for Dexterous Manipulation Imitation

Figure 4 for Task-Oriented Hand Motion Retargeting for Dexterous Manipulation Imitation

Abstract:Human hand actions are quite complex, especially when they involve object manipulation, mainly due to the high dimensionality of the hand and the vast action space that entails. Imitating those actions with dexterous hand models involves different important and challenging steps: acquiring human hand information, retargeting it to a hand model, and learning a policy from acquired data. In this work, we capture the hand information by using a state-of-the-art hand pose estimator. We tackle the retargeting problem from the hand pose to a 29 DoF hand model by combining inverse kinematics and PSO with a task objective optimisation. This objective encourages the virtual hand to accomplish the manipulation task, relieving the effect of the estimator's noise and the domain gap. Our approach leads to a better success rate in the grasping task compared to our inverse kinematics baseline, allowing us to record successful human demonstrations. Furthermore, we used these demonstrations to learn a policy network using generative adversarial imitation learning (GAIL) that is able to autonomously grasp an object in the virtual space.

* ECCV 2018 workshop paper

Via

Access Paper or Ask Questions

BOP: Benchmark for 6D Object Pose Estimation

Aug 24, 2018

Tomas Hodan, Frank Michel, Eric Brachmann, Wadim Kehl, Anders Glent Buch, Dirk Kraft, Bertram Drost, Joel Vidal, Stephan Ihrke, Xenophon Zabulis(+6 more)

Figure 1 for BOP: Benchmark for 6D Object Pose Estimation

Figure 2 for BOP: Benchmark for 6D Object Pose Estimation

Figure 3 for BOP: Benchmark for 6D Object Pose Estimation

Figure 4 for BOP: Benchmark for 6D Object Pose Estimation

Abstract:We propose a benchmark for 6D pose estimation of a rigid object from a single RGB-D input image. The training data consists of a texture-mapped 3D object model or images of the object in known 6D poses. The benchmark comprises of: i) eight datasets in a unified format that cover different practical scenarios, including two new datasets focusing on varying lighting conditions, ii) an evaluation methodology with a pose-error function that deals with pose ambiguities, iii) a comprehensive evaluation of 15 diverse recent methods that captures the status quo of the field, and iv) an online evaluation system that is open for continuous submission of new results. The evaluation shows that methods based on point-pair features currently perform best, outperforming template matching methods, learning-based methods and methods based on 3D local features. The project website is available at bop.felk.cvut.cz.

* ECCV 2018

Via

Access Paper or Ask Questions

Recovering 6D Object Pose: A Review and Multi-modal Analysis

Aug 15, 2018

Caner Sahin, Tae-Kyun Kim

Figure 1 for Recovering 6D Object Pose: A Review and Multi-modal Analysis

Figure 2 for Recovering 6D Object Pose: A Review and Multi-modal Analysis

Figure 3 for Recovering 6D Object Pose: A Review and Multi-modal Analysis

Figure 4 for Recovering 6D Object Pose: A Review and Multi-modal Analysis

Abstract:A large number of studies analyse object detection and pose estimation at visual level in 2D, discussing the effects of challenges such as occlusion, clutter, texture, etc., on the performances of the methods, which work in the context of RGB modality. Interpreting the depth data, the study in this paper presents thorough multi-modal analyses. It discusses the above-mentioned challenges for full 6D object pose estimation in RGB-D images comparing the performances of several 6D detectors in order to answer the following questions: What is the current position of the computer vision community for maintaining "automation" in robotic manipulation? What next steps should the community take for improving "autonomy" in robotics while handling objects? Our findings include: (i) reasonably accurate results are obtained on textured-objects at varying viewpoints with cluttered backgrounds. (ii) Heavy existence of occlusion and clutter severely affects the detectors, and similar-looking distractors is the biggest challenge in recovering instances' 6D. (iii) Template-based methods and random forest-based learning algorithms underlie object detection and 6D pose estimation. Recent paradigm is to learn deep discriminative feature representations and to adopt CNNs taking RGB images as input. (iv) Depending on the availability of large-scale 6D annotated depth datasets, feature representations can be learnt on these datasets, and then the learnt representations can be customized for the 6D problem.

Via

Access Paper or Ask Questions

Category-level 6D Object Pose Recovery in Depth Images

Aug 01, 2018

Caner Sahin, Tae-Kyun Kim

Figure 1 for Category-level 6D Object Pose Recovery in Depth Images

Figure 2 for Category-level 6D Object Pose Recovery in Depth Images

Figure 3 for Category-level 6D Object Pose Recovery in Depth Images

Figure 4 for Category-level 6D Object Pose Recovery in Depth Images

Abstract:Intra-class variations, distribution shifts among source and target domains are the major challenges of category-level tasks. In this study, we address category-level full 6D object pose estimation in the context of depth modality, introducing a novel part-based architecture that can tackle the above-mentioned challenges. Our architecture particularly adapts the distribution shifts arising from shape discrepancies, and naturally removes the variations of texture, illumination, pose, etc., so we call it as "Intrinsic Structure Adaptor (ISA)". We engineer ISA based on the followings: i) "Semantically Selected Centers (SSC)" are proposed in order to define the "6D pose" at the level of categories. ii) 3D skeleton structures, which we derive as shape-invariant features, are used to represent the parts extracted from the instances of given categories, and privileged one-class learning is employed based on these parts. iii) Graph matching is performed during training in such a way that the adaptation/generalization capability of the proposed architecture is improved across unseen instances. Experiments validate the promising performance of the proposed architecture on both synthetic and real datasets.

Via

Access Paper or Ask Questions

Multi-Task Deep Networks for Depth-Based 6D Object Pose and Joint Registration in Crowd Scenarios

Jun 11, 2018

Juil Sock, Kwang In Kim, Caner Sahin, Tae-Kyun Kim

Figure 1 for Multi-Task Deep Networks for Depth-Based 6D Object Pose and Joint Registration in Crowd Scenarios

Figure 2 for Multi-Task Deep Networks for Depth-Based 6D Object Pose and Joint Registration in Crowd Scenarios

Figure 3 for Multi-Task Deep Networks for Depth-Based 6D Object Pose and Joint Registration in Crowd Scenarios

Figure 4 for Multi-Task Deep Networks for Depth-Based 6D Object Pose and Joint Registration in Crowd Scenarios

Abstract:In bin-picking scenarios, multiple instances of an object of interest are stacked in a pile randomly, and hence, the instances are inherently subjected to the challenges: severe occlusion, clutter, and similar-looking distractors. Most existing methods are, however, for single isolated object instances, while some recent methods tackle crowd scenarios as post-refinement which accounts multiple object relations. In this paper, we address recovering 6D poses of multiple instances in bin-picking scenarios in depth modality by multi-task learning in deep neural networks. Our architecture jointly learns multiple sub-tasks: 2D detection, depth, and 3D pose estimation of individual objects; and joint registration of multiple objects. For training data generation, depth images of physically plausible object pose configurations are generated by a 3D object model in a physics simulation, which yields diverse occlusion patterns to learn. We adopt a state-of-the-art object detector, and 2D offsets are further estimated via a network to refine misaligned 2D detections. The depth and 3D pose estimator is designed to generate multiple hypotheses per detection. This allows the joint registration network to learn occlusion patterns and remove physically implausible pose hypotheses. We apply our architecture on both synthetic (our own and Sileane dataset) and real (a public Bin-Picking dataset) data, showing that it significantly outperforms state-of-the-art methods by 15-31% in average precision.

Via

Access Paper or Ask Questions

Occlusion-aware Hand Pose Estimation Using Hierarchical Mixture Density Network

May 21, 2018

Qi Ye, Tae-Kyun Kim

Figure 1 for Occlusion-aware Hand Pose Estimation Using Hierarchical Mixture Density Network

Figure 2 for Occlusion-aware Hand Pose Estimation Using Hierarchical Mixture Density Network

Figure 3 for Occlusion-aware Hand Pose Estimation Using Hierarchical Mixture Density Network

Figure 4 for Occlusion-aware Hand Pose Estimation Using Hierarchical Mixture Density Network

Abstract:Learning and predicting the pose parameters of a 3D hand model given an image, such as locations of hand joints, is challenging due to large viewpoint changes and articulations, and severe self-occlusions exhibited particularly in egocentric views. Both feature learning and prediction modeling have been investigated to tackle the problem. Though effective, most existing discriminative methods yield a single deterministic estimation of target poses. Due to their single-value mapping intrinsic, they fail to adequately handle self-occlusion problems, where occluded joints present multiple modes. In this paper, we tackle the self-occlusion issue and provide a complete description of observed poses given an input depth image by a novel method called hierarchical mixture density networks (HMDN). The proposed method leverages the state-of-the-art hand pose estimators based on Convolutional Neural Networks to facilitate feature learning, while it models the multiple modes in a two-level hierarchy to reconcile single-valued and multi-valued mapping in its output. The whole framework with a mixture of two differentiable density functions is naturally end-to-end trainable. In the experiments, HMDN produces interpretable and diverse candidate samples, and significantly outperforms the state-of-the-art methods on two benchmarks with occlusions, and performs comparably on another benchmark free of occlusions.

Via

Access Paper or Ask Questions

Deep Convolutional Decision Jungle for Image Classification

May 18, 2018

Seungryul Baek, Kwang In Kim, Tae-Kyun Kim

Figure 1 for Deep Convolutional Decision Jungle for Image Classification

Figure 2 for Deep Convolutional Decision Jungle for Image Classification

Figure 3 for Deep Convolutional Decision Jungle for Image Classification

Figure 4 for Deep Convolutional Decision Jungle for Image Classification

Abstract:We propose a novel method called deep convolutional decision jungle (CDJ) and its learning algorithm for image classification. The CDJ maintains the structure of standard convolutional neural networks (CNNs), i.e. multiple layers of multiple response maps fully connected. Each response map-or node-in both the convolutional and fully-connected layers selectively respond to class labels s.t. each data sample travels via a specific soft route of those activated nodes. The proposed method CDJ automatically learns features, whereas decision forests and jungles require pre-defined feature sets. Compared to CNNs, the method embeds the benefits of using data-dependent discriminative functions, which better handles multi-modal/heterogeneous data; further,the method offers more diverse sparse network responses, which in turn can be used for cost-effective learning/classification. The network is learnt by combining conventional softmax and proposed entropy losses in each layer. The entropy loss,as used in decision tree growing, measures the purity of data activation according to the class label distribution. The back-propagation rule for the proposed loss function is derived from stochastic gradient descent (SGD) optimization of CNNs. We show that our proposed method outperforms state-of-the-art methods on three public image classification benchmarks and one face verification dataset. We also demonstrate the use of auxiliary data labels, when available, which helps our method to learn more discriminative routing and representations and leads to improved classification.

Via

Access Paper or Ask Questions