Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liang Lin

Instance-level Human Parsing via Part Grouping Network

Aug 01, 2018

Ke Gong, Xiaodan Liang, Yicheng Li, Yimin Chen, Ming Yang, Liang Lin

Figure 1 for Instance-level Human Parsing via Part Grouping Network

Figure 2 for Instance-level Human Parsing via Part Grouping Network

Figure 3 for Instance-level Human Parsing via Part Grouping Network

Figure 4 for Instance-level Human Parsing via Part Grouping Network

Abstract:Instance-level human parsing towards real-world human analysis scenarios is still under-explored due to the absence of sufficient data resources and technical difficulty in parsing multiple instances in a single pass. Several related works all follow the "parsing-by-detection" pipeline that heavily relies on separately trained detection models to localize instances and then performs human parsing for each instance sequentially. Nonetheless, two discrepant optimization targets of detection and parsing lead to suboptimal representation learning and error accumulation for final results. In this work, we make the first attempt to explore a detection-free Part Grouping Network (PGN) for efficiently parsing multiple people in an image in a single pass. Our PGN reformulates instance-level human parsing as two twinned sub-tasks that can be jointly learned and mutually refined via a unified network: 1) semantic part segmentation for assigning each pixel as a human part (e.g., face, arms); 2) instance-aware edge detection to group semantic parts into distinct person instances. Thus the shared intermediate representation would be endowed with capabilities in both characterizing fine-grained parts and inferring instance belongings of each part. Finally, a simple instance partition process is employed to get final results during inference. We conducted experiments on PASCAL-Person-Part dataset and our PGN outperforms all state-of-the-art methods. Furthermore, we show its superiority on a newly collected multi-person parsing dataset (CIHP) including 38,280 diverse images, which is the largest dataset so far and can facilitate more advanced human analysis. The CIHP benchmark and our source code are available at http://sysu-hcp.net/lip/.

* Accepted by ECCV 2018 (Oral)

Via

Access Paper or Ask Questions

Learning to Segment Object Candidates via Recursive Neural Networks

Jul 29, 2018

Tianshui Chen, Liang Lin, Xian Wu, Nong Xiao, Xiaonan Luo

Figure 1 for Learning to Segment Object Candidates via Recursive Neural Networks

Figure 2 for Learning to Segment Object Candidates via Recursive Neural Networks

Figure 3 for Learning to Segment Object Candidates via Recursive Neural Networks

Figure 4 for Learning to Segment Object Candidates via Recursive Neural Networks

Abstract:To avoid the exhaustive search over locations and scales, current state-of-the-art object detection systems usually involve a crucial component generating a batch of candidate object proposals from images. In this paper, we present a simple yet effective approach for segmenting object proposals via a deep architecture of recursive neural networks (ReNNs), which hierarchically groups regions for detecting object candidates over scales. Unlike traditional methods that mainly adopt fixed similarity measures for merging regions or finding object proposals, our approach adaptively learns the region merging similarity and the objectness measure during the process of hierarchical region grouping. Specifically, guided by a structured loss, the ReNN model jointly optimizes the cross-region similarity metric with the region merging process as well as the objectness prediction. During inference of the object proposal generation, we introduce randomness into the greedy search to cope with the ambiguity of grouping regions. Extensive experiments on standard benchmarks, e.g., PASCAL VOC and ImageNet, suggest that our approach is capable of producing object proposals with high recall while well preserving the object boundaries and outperforms other existing methods in both accuracy and efficiency.

* Accepted at TIP

Via

Access Paper or Ask Questions

SCAN: Self-and-Collaborative Attention Network for Video Person Re-identification

Jul 20, 2018

Ruimao Zhang, Hongbin Sun, Jingyu Li, Yuying Ge, Liang Lin, Ping Luo, Xiaogang Wang

Figure 1 for SCAN: Self-and-Collaborative Attention Network for Video Person Re-identification

Figure 2 for SCAN: Self-and-Collaborative Attention Network for Video Person Re-identification

Figure 3 for SCAN: Self-and-Collaborative Attention Network for Video Person Re-identification

Figure 4 for SCAN: Self-and-Collaborative Attention Network for Video Person Re-identification

Abstract:Video person re-identification attracts much attention in recent years. It aims to match image sequences of pedestrians from different camera views. Previous approaches usually improve this task from three aspects, including a) selecting more discriminative frames, b) generating more informative temporal representations, and c) developing more effective distance metrics. To address the above issues, we present a novel and practical deep architecture for video person re-identification termed Self-and-Collaborative Attention Network (SCAN). It has several appealing properties. First, SCAN adopts non-parametric attention mechanism to refine the intra-sequence and inter-sequence feature representation of videos, and outputs self-and-collaborative feature representation for each video, making the discriminative frames aligned between the probe and gallery sequences.Second, beyond existing models, a generalized pairwise similarity measurement is proposed to calculate the similarity feature representations of video pairs, enabling computing the matching scores by the binary classifier. Third, a dense clip segmentation strategy is also introduced to generate rich probe-gallery pairs to optimize the model. Extensive experiments demonstrate the effectiveness of SCAN, which outperforms top-1 accuracies of the best-performing baselines by 7.8%, 2.1% and 4.9% on iLIDS-VID, PRID2011 and MARS dataset, respectively.

* 10 pages, 3 figures

Via

Access Paper or Ask Questions

Crowd Counting using Deep Recurrent Spatial-Aware Network

Jul 02, 2018

Lingbo Liu, Hongjun Wang, Guanbin Li, Wanli Ouyang, Liang Lin

Figure 1 for Crowd Counting using Deep Recurrent Spatial-Aware Network

Figure 2 for Crowd Counting using Deep Recurrent Spatial-Aware Network

Figure 3 for Crowd Counting using Deep Recurrent Spatial-Aware Network

Figure 4 for Crowd Counting using Deep Recurrent Spatial-Aware Network

Abstract:Crowd counting from unconstrained scene images is a crucial task in many real-world applications like urban surveillance and management, but it is greatly challenged by the camera's perspective that causes huge appearance variations in people's scales and rotations. Conventional methods address such challenges by resorting to fixed multi-scale architectures that are often unable to cover the largely varied scales while ignoring the rotation variations. In this paper, we propose a unified neural network framework, named Deep Recurrent Spatial-Aware Network, which adaptively addresses the two issues in a learnable spatial transform module with a region-wise refinement process. Specifically, our framework incorporates a Recurrent Spatial-Aware Refinement (RSAR) module iteratively conducting two components: i) a Spatial Transformer Network that dynamically locates an attentional region from the crowd density map and transforms it to the suitable scale and rotation for optimal crowd estimation; ii) a Local Refinement Network that refines the density map of the attended region with residual learning. Extensive experiments on four challenging benchmarks show the effectiveness of our approach. Specifically, comparing with the existing best-performing methods, we achieve an improvement of 12% on the largest dataset WorldExpo'10 and 22.8% on the most challenging dataset UCF_CC_50.

* Accepted to IJCAI 2018

Via

Access Paper or Ask Questions

Knowledge-Embedded Representation Learning for Fine-Grained Image Recognition

Jul 02, 2018

Tianshui Chen, Liang Lin, Riquan Chen, Yang Wu, Xiaonan Luo

Figure 1 for Knowledge-Embedded Representation Learning for Fine-Grained Image Recognition

Figure 2 for Knowledge-Embedded Representation Learning for Fine-Grained Image Recognition

Figure 3 for Knowledge-Embedded Representation Learning for Fine-Grained Image Recognition

Figure 4 for Knowledge-Embedded Representation Learning for Fine-Grained Image Recognition

Abstract:Humans can naturally understand an image in depth with the aid of rich knowledge accumulated from daily lives or professions. For example, to achieve fine-grained image recognition (e.g., categorizing hundreds of subordinate categories of birds) usually requires a comprehensive visual concept organization including category labels and part-level attributes. In this work, we investigate how to unify rich professional knowledge with deep neural network architectures and propose a Knowledge-Embedded Representation Learning (KERL) framework for handling the problem of fine-grained image recognition. Specifically, we organize the rich visual concepts in the form of knowledge graph and employ a Gated Graph Neural Network to propagate node message through the graph for generating the knowledge representation. By introducing a novel gated mechanism, our KERL framework incorporates this knowledge representation into the discriminative image feature learning, i.e., implicitly associating the specific attributes with the feature maps. Compared with existing methods of fine-grained image classification, our KERL framework has several appealing properties: i) The embedded high-level knowledge enhances the feature representation, thus facilitating distinguishing the subtle differences among subordinate categories. ii) Our framework can learn feature maps with a meaningful configuration that the highlighted regions finely accord with the nodes (specific attributes) of the knowledge graph. Extensive experiments on the widely used Caltech-UCSD bird dataset demonstrate the superiority of our KERL framework over existing state-of-the-art methods.

* Accepted at IJCAI 2018. The first work that introduces high-level knowledge to enhance representation learning for fine-grained image classification

Via

Access Paper or Ask Questions

Deep Reasoning with Knowledge Graph for Social Relationship Understanding

Jul 02, 2018

Zhouxia Wang, Tianshui Chen, Jimmy Ren, Weihao Yu, Hui Cheng, Liang Lin

Figure 1 for Deep Reasoning with Knowledge Graph for Social Relationship Understanding

Figure 2 for Deep Reasoning with Knowledge Graph for Social Relationship Understanding

Figure 3 for Deep Reasoning with Knowledge Graph for Social Relationship Understanding

Figure 4 for Deep Reasoning with Knowledge Graph for Social Relationship Understanding

Abstract:Social relationships (e.g., friends, couple etc.) form the basis of the social network in our daily life. Automatically interpreting such relationships bears a great potential for the intelligent systems to understand human behavior in depth and to better interact with people at a social level. Human beings interpret the social relationships within a group not only based on the people alone, and the interplay between such social relationships and the contextual information around the people also plays a significant role. However, these additional cues are largely overlooked by the previous studies. We found that the interplay between these two factors can be effectively modeled by a novel structured knowledge graph with proper message propagation and attention. And this structured knowledge can be efficiently integrated into the deep neural network architecture to promote social relationship understanding by an end-to-end trainable Graph Reasoning Model (GRM), in which a propagation mechanism is learned to propagate node message through the graph to explore the interaction between persons of interest and the contextual objects. Meanwhile, a graph attentional mechanism is introduced to explicitly reason about the discriminative objects to promote recognition. Extensive experiments on the public benchmarks demonstrate the superiority of our method over the existing leading competitors.

* Accepted at IJCAI 2018. The first work that integrates high-level knowledge graph to reason about social relationships between person pair of interest in still image

Via

Access Paper or Ask Questions

Towards Human-Machine Cooperation: Self-supervised Sample Mining for Object Detection

May 24, 2018

Keze Wang, Xiaopeng Yan, Dongyu Zhang, Lei Zhang, Liang Lin

Figure 1 for Towards Human-Machine Cooperation: Self-supervised Sample Mining for Object Detection

Figure 2 for Towards Human-Machine Cooperation: Self-supervised Sample Mining for Object Detection

Figure 3 for Towards Human-Machine Cooperation: Self-supervised Sample Mining for Object Detection

Figure 4 for Towards Human-Machine Cooperation: Self-supervised Sample Mining for Object Detection

Abstract:Though quite challenging, leveraging large-scale unlabeled or partially labeled images in a cost-effective way has increasingly attracted interests for its great importance to computer vision. To tackle this problem, many Active Learning (AL) methods have been developed. However, these methods mainly define their sample selection criteria within a single image context, leading to the suboptimal robustness and impractical solution for large-scale object detection. In this paper, aiming to remedy the drawbacks of existing AL methods, we present a principled Self-supervised Sample Mining (SSM) process accounting for the real challenges in object detection. Specifically, our SSM process concentrates on automatically discovering and pseudo-labeling reliable region proposals for enhancing the object detector via the introduced cross image validation, i.e., pasting these proposals into different labeled images to comprehensively measure their values under different image contexts. By resorting to the SSM process, we propose a new AL framework for gradually incorporating unlabeled or partially labeled data into the model learning while minimizing the annotating effort of users. Extensive experiments on two public benchmarks clearly demonstrate our proposed framework can achieve the comparable performance to the state-of-the-art methods with significantly fewer annotations.

* We enabled to mine from unlabeled or partially labeled data to boost object detection (Accepted by CVPR 2018) The source code is available at http://kezewang.com/codes/SSM_CVPR.zip

Via

Access Paper or Ask Questions

DRPose3D: Depth Ranking in 3D Human Pose Estimation

May 24, 2018

Min Wang, Xipeng Chen, Wentao Liu, Chen Qian, Liang Lin, Lizhuang Ma

Figure 1 for DRPose3D: Depth Ranking in 3D Human Pose Estimation

Figure 2 for DRPose3D: Depth Ranking in 3D Human Pose Estimation

Figure 3 for DRPose3D: Depth Ranking in 3D Human Pose Estimation

Figure 4 for DRPose3D: Depth Ranking in 3D Human Pose Estimation

Abstract:In this paper, we propose a two-stage depth ranking based method (DRPose3D) to tackle the problem of 3D human pose estimation. Instead of accurate 3D positions, the depth ranking can be identified by human intuitively and learned using the deep neural network more easily by solving classification problems. Moreover, depth ranking contains rich 3D information. It prevents the 2D-to-3D pose regression in two-stage methods from being ill-posed. In our method, firstly, we design a Pairwise Ranking Convolutional Neural Network (PRCNN) to extract depth rankings of human joints from images. Secondly, a coarse-to-fine 3D Pose Network(DPNet) is proposed to estimate 3D poses from both depth rankings and 2D human joint locations. Additionally, to improve the generality of our model, we introduce a statistical method to augment depth rankings. Our approach outperforms the state-of-the-art methods in the Human3.6M benchmark for all three testing protocols, indicating that depth ranking is an essential geometric feature which can be learned to improve the 3D pose estimation.

* Accepted by the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018)

Via

Access Paper or Ask Questions

Multi-level Wavelet-CNN for Image Restoration

May 22, 2018

Pengju Liu, Hongzhi Zhang, Kai Zhang, Liang Lin, Wangmeng Zuo

Figure 1 for Multi-level Wavelet-CNN for Image Restoration

Figure 2 for Multi-level Wavelet-CNN for Image Restoration

Figure 3 for Multi-level Wavelet-CNN for Image Restoration

Figure 4 for Multi-level Wavelet-CNN for Image Restoration

Abstract:The tradeoff between receptive field size and efficiency is a crucial issue in low level vision. Plain convolutional networks (CNNs) generally enlarge the receptive field at the expense of computational cost. Recently, dilated filtering has been adopted to address this issue. But it suffers from gridding effect, and the resulting receptive field is only a sparse sampling of input image with checkerboard patterns. In this paper, we present a novel multi-level wavelet CNN (MWCNN) model for better tradeoff between receptive field size and computational efficiency. With the modified U-Net architecture, wavelet transform is introduced to reduce the size of feature maps in the contracting subnetwork. Furthermore, another convolutional layer is further used to decrease the channels of feature maps. In the expanding subnetwork, inverse wavelet transform is then deployed to reconstruct the high resolution feature maps. Our MWCNN can also be explained as the generalization of dilated filtering and subsampling, and can be applied to many image restoration tasks. The experimental results clearly show the effectiveness of MWCNN for image denoising, single image super-resolution, and JPEG image artifacts removal.

* Accepted for publication at CVPR NTIRE Workshop, 2018

Via

Access Paper or Ask Questions

Visual Tracking via Dynamic Graph Learning

Apr 30, 2018

Chenglong Li, Liang Lin, Wangmeng Zuo, Jin Tang, Ming-Hsuan Yang

Figure 1 for Visual Tracking via Dynamic Graph Learning

Figure 2 for Visual Tracking via Dynamic Graph Learning

Figure 3 for Visual Tracking via Dynamic Graph Learning

Figure 4 for Visual Tracking via Dynamic Graph Learning

Abstract:Existing visual tracking methods usually localize a target object with a bounding box, in which the performance of the foreground object trackers or detectors is often affected by the inclusion of background clutter. To handle this problem, we learn a patch-based graph representation for visual tracking. The tracked object is modeled by with a graph by taking a set of non-overlapping image patches as nodes, in which the weight of each node indicates how likely it belongs to the foreground and edges are weighted for indicating the appearance compatibility of two neighboring nodes. This graph is dynamically learned and applied in object tracking and model updating. During the tracking process, the proposed algorithm performs three main steps in each frame. First, the graph is initialized by assigning binary weights of some image patches to indicate the object and background patches according to the predicted bounding box. Second, the graph is optimized to refine the patch weights by using a novel alternating direction method of multipliers. Third, the object feature representation is updated by imposing the weights of patches on the extracted image features. The object location is predicted by maximizing the classification score in the structured support vector machine. Extensive experiments show that the proposed tracking algorithm performs well against the state-of-the-art methods on large-scale benchmark datasets.

* Submitted to TPAMI 2017

Via

Access Paper or Ask Questions