Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hengkai Guo

Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation

Jun 24, 2018

Xinghao Chen, Guijin Wang, Hengkai Guo, Cairong Zhang

Figure 1 for Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation

Figure 2 for Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation

Figure 3 for Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation

Figure 4 for Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation

Abstract:Hand pose estimation from a single depth image is an essential topic in computer vision and human computer interaction. Despite recent advancements in this area promoted by convolutional neural network, accurate hand pose estimation is still a challenging problem. In this paper we propose a Pose guided structured Region Ensemble Network (Pose-REN) to boost the performance of hand pose estimation. The proposed method extracts regions from the feature maps of convolutional neural network under the guide of an initially estimated pose, generating more optimal and representative features for hand pose estimation. The extracted feature regions are then integrated hierarchically according to the topology of hand joints by employing tree-structured fully connections. A refined estimation of hand pose is directly regressed by the proposed network and the final hand pose is obtained by utilizing an iterative cascaded method. Comprehensive experiments on public hand pose datasets demonstrate that our proposed method outperforms state-of-the-art algorithms.

* Accepted by Neurocomputing

Via

Access Paper or Ask Questions

Two-Stream Binocular Network: Accurate Near Field Finger Detection Based On Binocular Images

Apr 26, 2018

Yi Wei, Guijin Wang, Cairong Zhang, Hengkai Guo, Xinghao Chen, Huazhong Yang

Figure 1 for Two-Stream Binocular Network: Accurate Near Field Finger Detection Based On Binocular Images

Figure 2 for Two-Stream Binocular Network: Accurate Near Field Finger Detection Based On Binocular Images

Figure 3 for Two-Stream Binocular Network: Accurate Near Field Finger Detection Based On Binocular Images

Figure 4 for Two-Stream Binocular Network: Accurate Near Field Finger Detection Based On Binocular Images

Abstract:Fingertip detection plays an important role in human computer interaction. Previous works transform binocular images into depth images. Then depth-based hand pose estimation methods are used to predict 3D positions of fingertips. Different from previous works, we propose a new framework, named Two-Stream Binocular Network (TSBnet) to detect fingertips from binocular images directly. TSBnet first shares convolutional layers for low level features of right and left images. Then it extracts high level features in two-stream convolutional networks separately. Further, we add a new layer: binocular distance measurement layer to improve performance of our model. To verify our scheme, we build a binocular hand image dataset, containing about 117k pairs of images in training set and 10k pairs of images in test set. Our methods achieve an average error of 10.9mm on our test set, outperforming previous work by 5.9mm (relatively 35.1%).

* Visual Communications and Image Processing (VCIP), 2017 IEEE (2017) 1-4
* Published in: Visual Communications and Image Processing (VCIP), 2017 IEEE. Original IEEE publication available on https://ieeexplore.ieee.org/abstract/document/8305146/. Dataset available on https://sites.google.com/view/thuhand17

Via

Access Paper or Ask Questions

Motion Feature Augmented Recurrent Neural Network for Skeleton-based Dynamic Hand Gesture Recognition

Aug 10, 2017

Xinghao Chen, Hengkai Guo, Guijin Wang, Li Zhang

Figure 1 for Motion Feature Augmented Recurrent Neural Network for Skeleton-based Dynamic Hand Gesture Recognition

Figure 2 for Motion Feature Augmented Recurrent Neural Network for Skeleton-based Dynamic Hand Gesture Recognition

Figure 3 for Motion Feature Augmented Recurrent Neural Network for Skeleton-based Dynamic Hand Gesture Recognition

Figure 4 for Motion Feature Augmented Recurrent Neural Network for Skeleton-based Dynamic Hand Gesture Recognition

Abstract:Dynamic hand gesture recognition has attracted increasing interests because of its importance for human computer interaction. In this paper, we propose a new motion feature augmented recurrent neural network for skeleton-based dynamic hand gesture recognition. Finger motion features are extracted to describe finger movements and global motion features are utilized to represent the global movement of hand skeleton. These motion features are then fed into a bidirectional recurrent neural network (RNN) along with the skeleton sequence, which can augment the motion features for RNN and improve the classification performance. Experiments demonstrate that our proposed method is effective and outperforms start-of-the-art methods.

* Accepted by ICIP 2017

Via

Access Paper or Ask Questions

Towards Good Practices for Deep 3D Hand Pose Estimation

Jul 23, 2017

Hengkai Guo, Guijin Wang, Xinghao Chen, Cairong Zhang

Figure 1 for Towards Good Practices for Deep 3D Hand Pose Estimation

Figure 2 for Towards Good Practices for Deep 3D Hand Pose Estimation

Figure 3 for Towards Good Practices for Deep 3D Hand Pose Estimation

Figure 4 for Towards Good Practices for Deep 3D Hand Pose Estimation

Abstract:3D hand pose estimation from single depth image is an important and challenging problem for human-computer interaction. Recently deep convolutional networks (ConvNet) with sophisticated design have been employed to address it, but the improvement over traditional random forest based methods is not so apparent. To exploit the good practice and promote the performance for hand pose estimation, we propose a tree-structured Region Ensemble Network (REN) for directly 3D coordinate regression. It first partitions the last convolution outputs of ConvNet into several grid regions. The results from separate fully-connected (FC) regressors on each regions are then integrated by another FC layer to perform the estimation. By exploitation of several training strategies including data augmentation and smooth $L_1$ loss, proposed REN can significantly improve the performance of ConvNet to localize hand joints. The experimental results demonstrate that our approach achieves the best performance among state-of-the-art algorithms on three public hand pose datasets. We also experiment our methods on fingertip detection and human pose datasets and obtain state-of-the-art accuracy.

* Extended version of arXiv:1702.02447

Via

Access Paper or Ask Questions

Region Ensemble Network: Improving Convolutional Network for Hand Pose Estimation

May 09, 2017

Hengkai Guo, Guijin Wang, Xinghao Chen, Cairong Zhang, Fei Qiao, Huazhong Yang

Figure 1 for Region Ensemble Network: Improving Convolutional Network for Hand Pose Estimation

Figure 2 for Region Ensemble Network: Improving Convolutional Network for Hand Pose Estimation

Figure 3 for Region Ensemble Network: Improving Convolutional Network for Hand Pose Estimation

Figure 4 for Region Ensemble Network: Improving Convolutional Network for Hand Pose Estimation

Abstract:Hand pose estimation from monocular depth images is an important and challenging problem for human-computer interaction. Recently deep convolutional networks (ConvNet) with sophisticated design have been employed to address it, but the improvement over traditional methods is not so apparent. To promote the performance of directly 3D coordinate regression, we propose a tree-structured Region Ensemble Network (REN), which partitions the convolution outputs into regions and integrates the results from multiple regressors on each regions. Compared with multi-model ensemble, our model is completely end-to-end training. The experimental results demonstrate that our approach achieves the best performance among state-of-the-arts on two public datasets.

* Accepted to ICIP 2017. Project: https://github.com/guohengkai/region-ensemble-network

Via

Access Paper or Ask Questions

Two-stream convolutional neural network for accurate RGB-D fingertip detection using depth and edge information

Dec 23, 2016

Hengkai Guo, Guijin Wang, Xinghao Chen

Figure 1 for Two-stream convolutional neural network for accurate RGB-D fingertip detection using depth and edge information

Figure 2 for Two-stream convolutional neural network for accurate RGB-D fingertip detection using depth and edge information

Abstract:Accurate detection of fingertips in depth image is critical for human-computer interaction. In this paper, we present a novel two-stream convolutional neural network (CNN) for RGB-D fingertip detection. Firstly edge image is extracted from raw depth image using random forest. Then the edge information is combined with depth information in our CNN structure. We study several fusion approaches and suggest a slow fusion strategy as a promising way of fingertip detection. As shown in our experiments, our real-time algorithm outperforms state-of-the-art fingertip detection methods on the public dataset HandNet with an average 3D error of 9.9mm, and shows comparable accuracy of fingertip estimation on NYU hand dataset.

* Accepted by ICIP 2016

Via

Access Paper or Ask Questions