Alert button
Picture for Tianshui Chen

Tianshui Chen

Alert button

Learning to Segment Object Candidates via Recursive Neural Networks

Jul 29, 2018
Tianshui Chen, Liang Lin, Xian Wu, Nong Xiao, Xiaonan Luo

Figure 1 for Learning to Segment Object Candidates via Recursive Neural Networks
Figure 2 for Learning to Segment Object Candidates via Recursive Neural Networks
Figure 3 for Learning to Segment Object Candidates via Recursive Neural Networks
Figure 4 for Learning to Segment Object Candidates via Recursive Neural Networks

To avoid the exhaustive search over locations and scales, current state-of-the-art object detection systems usually involve a crucial component generating a batch of candidate object proposals from images. In this paper, we present a simple yet effective approach for segmenting object proposals via a deep architecture of recursive neural networks (ReNNs), which hierarchically groups regions for detecting object candidates over scales. Unlike traditional methods that mainly adopt fixed similarity measures for merging regions or finding object proposals, our approach adaptively learns the region merging similarity and the objectness measure during the process of hierarchical region grouping. Specifically, guided by a structured loss, the ReNN model jointly optimizes the cross-region similarity metric with the region merging process as well as the objectness prediction. During inference of the object proposal generation, we introduce randomness into the greedy search to cope with the ambiguity of grouping regions. Extensive experiments on standard benchmarks, e.g., PASCAL VOC and ImageNet, suggest that our approach is capable of producing object proposals with high recall while well preserving the object boundaries and outperforms other existing methods in both accuracy and efficiency.

* Accepted at TIP 
Viaarxiv icon

Knowledge-Embedded Representation Learning for Fine-Grained Image Recognition

Jul 02, 2018
Tianshui Chen, Liang Lin, Riquan Chen, Yang Wu, Xiaonan Luo

Figure 1 for Knowledge-Embedded Representation Learning for Fine-Grained Image Recognition
Figure 2 for Knowledge-Embedded Representation Learning for Fine-Grained Image Recognition
Figure 3 for Knowledge-Embedded Representation Learning for Fine-Grained Image Recognition
Figure 4 for Knowledge-Embedded Representation Learning for Fine-Grained Image Recognition

Humans can naturally understand an image in depth with the aid of rich knowledge accumulated from daily lives or professions. For example, to achieve fine-grained image recognition (e.g., categorizing hundreds of subordinate categories of birds) usually requires a comprehensive visual concept organization including category labels and part-level attributes. In this work, we investigate how to unify rich professional knowledge with deep neural network architectures and propose a Knowledge-Embedded Representation Learning (KERL) framework for handling the problem of fine-grained image recognition. Specifically, we organize the rich visual concepts in the form of knowledge graph and employ a Gated Graph Neural Network to propagate node message through the graph for generating the knowledge representation. By introducing a novel gated mechanism, our KERL framework incorporates this knowledge representation into the discriminative image feature learning, i.e., implicitly associating the specific attributes with the feature maps. Compared with existing methods of fine-grained image classification, our KERL framework has several appealing properties: i) The embedded high-level knowledge enhances the feature representation, thus facilitating distinguishing the subtle differences among subordinate categories. ii) Our framework can learn feature maps with a meaningful configuration that the highlighted regions finely accord with the nodes (specific attributes) of the knowledge graph. Extensive experiments on the widely used Caltech-UCSD bird dataset demonstrate the superiority of our KERL framework over existing state-of-the-art methods.

* Accepted at IJCAI 2018. The first work that introduces high-level knowledge to enhance representation learning for fine-grained image classification 
Viaarxiv icon

Deep Reasoning with Knowledge Graph for Social Relationship Understanding

Jul 02, 2018
Zhouxia Wang, Tianshui Chen, Jimmy Ren, Weihao Yu, Hui Cheng, Liang Lin

Figure 1 for Deep Reasoning with Knowledge Graph for Social Relationship Understanding
Figure 2 for Deep Reasoning with Knowledge Graph for Social Relationship Understanding
Figure 3 for Deep Reasoning with Knowledge Graph for Social Relationship Understanding
Figure 4 for Deep Reasoning with Knowledge Graph for Social Relationship Understanding

Social relationships (e.g., friends, couple etc.) form the basis of the social network in our daily life. Automatically interpreting such relationships bears a great potential for the intelligent systems to understand human behavior in depth and to better interact with people at a social level. Human beings interpret the social relationships within a group not only based on the people alone, and the interplay between such social relationships and the contextual information around the people also plays a significant role. However, these additional cues are largely overlooked by the previous studies. We found that the interplay between these two factors can be effectively modeled by a novel structured knowledge graph with proper message propagation and attention. And this structured knowledge can be efficiently integrated into the deep neural network architecture to promote social relationship understanding by an end-to-end trainable Graph Reasoning Model (GRM), in which a propagation mechanism is learned to propagate node message through the graph to explore the interaction between persons of interest and the contextual objects. Meanwhile, a graph attentional mechanism is introduced to explicitly reason about the discriminative objects to promote recognition. Extensive experiments on the public benchmarks demonstrate the superiority of our method over the existing leading competitors.

* Accepted at IJCAI 2018. The first work that integrates high-level knowledge graph to reason about social relationships between person pair of interest in still image 
Viaarxiv icon

Learning a Wavelet-like Auto-Encoder to Accelerate Deep Neural Networks

Dec 20, 2017
Tianshui Chen, Liang Lin, Wangmeng Zuo, Xiaonan Luo, Lei Zhang

Figure 1 for Learning a Wavelet-like Auto-Encoder to Accelerate Deep Neural Networks
Figure 2 for Learning a Wavelet-like Auto-Encoder to Accelerate Deep Neural Networks
Figure 3 for Learning a Wavelet-like Auto-Encoder to Accelerate Deep Neural Networks
Figure 4 for Learning a Wavelet-like Auto-Encoder to Accelerate Deep Neural Networks

Accelerating deep neural networks (DNNs) has been attracting increasing attention as it can benefit a wide range of applications, e.g., enabling mobile systems with limited computing resources to own powerful visual recognition ability. A practical strategy to this goal usually relies on a two-stage process: operating on the trained DNNs (e.g., approximating the convolutional filters with tensor decomposition) and fine-tuning the amended network, leading to difficulty in balancing the trade-off between acceleration and maintaining recognition performance. In this work, aiming at a general and comprehensive way for neural network acceleration, we develop a Wavelet-like Auto-Encoder (WAE) that decomposes the original input image into two low-resolution channels (sub-images) and incorporate the WAE into the classification neural networks for joint training. The two decomposed channels, in particular, are encoded to carry the low-frequency information (e.g., image profiles) and high-frequency (e.g., image details or noises), respectively, and enable reconstructing the original input image through the decoding process. Then, we feed the low-frequency channel into a standard classification network such as VGG or ResNet and employ a very lightweight network to fuse with the high-frequency channel to obtain the classification result. Compared to existing DNN acceleration solutions, our framework has the following advantages: i) it is tolerant to any existing convolutional neural networks for classification without amending their structures; ii) the WAE provides an interpretable way to preserve the main components of the input image for classification.

* Accepted at AAAI 2018 
Viaarxiv icon

Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition

Dec 20, 2017
Tianshui Chen, Zhouxia Wang, Guanbin Li, Liang Lin

Figure 1 for Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition
Figure 2 for Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition
Figure 3 for Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition
Figure 4 for Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition

Recognizing multiple labels of images is a fundamental but challenging task in computer vision, and remarkable progress has been attained by localizing semantic-aware image regions and predicting their labels with deep convolutional neural networks. The step of hypothesis regions (region proposals) localization in these existing multi-label image recognition pipelines, however, usually takes redundant computation cost, e.g., generating hundreds of meaningless proposals with non-discriminative information and extracting their features, and the spatial contextual dependency modeling among the localized regions are often ignored or over-simplified. To resolve these issues, this paper proposes a recurrent attention reinforcement learning framework to iteratively discover a sequence of attentional and informative regions that are related to different semantic objects and further predict label scores conditioned on these regions. Besides, our method explicitly models long-term dependencies among these attentional regions that help to capture semantic label co-occurrence and thus facilitate multi-label recognition. Extensive experiments and comparisons on two large-scale benchmarks (i.e., PASCAL VOC and MS-COCO) show that our model achieves superior performance over existing state-of-the-art methods in both performance and efficiency as well as explicitly identifying image-level semantic labels to specific object regions.

* Accepted at AAAI 2018 
Viaarxiv icon

Multi-label Image Recognition by Recurrently Discovering Attentional Regions

Nov 08, 2017
Zhouxia Wang, Tianshui Chen, Guanbin Li, Ruijia Xu, Liang Lin

Figure 1 for Multi-label Image Recognition by Recurrently Discovering Attentional Regions
Figure 2 for Multi-label Image Recognition by Recurrently Discovering Attentional Regions
Figure 3 for Multi-label Image Recognition by Recurrently Discovering Attentional Regions
Figure 4 for Multi-label Image Recognition by Recurrently Discovering Attentional Regions

This paper proposes a novel deep architecture to address multi-label image recognition, a fundamental and practical task towards general visual understanding. Current solutions for this task usually rely on an extra step of extracting hypothesis regions (i.e., region proposals), resulting in redundant computation and sub-optimal performance. In this work, we achieve the interpretable and contextualized multi-label image classification by developing a recurrent memorized-attention module. This module consists of two alternately performed components: i) a spatial transformer layer to locate attentional regions from the convolutional feature maps in a region-proposal-free way and ii) an LSTM (Long-Short Term Memory) sub-network to sequentially predict semantic labeling scores on the located regions while capturing the global dependencies of these regions. The LSTM also output the parameters for computing the spatial transformer. On large-scale benchmarks of multi-label image classification (e.g., MS-COCO and PASCAL VOC 07), our approach demonstrates superior performances over other existing state-of-the-arts in both accuracy and efficiency.

* Accepted at ICCV 2017 
Viaarxiv icon

Content-Adaptive Sketch Portrait Generation by Decompositional Representation Learning

Oct 04, 2017
Dongyu Zhang, Liang Lin, Tianshui Chen, Xian Wu, Wenwei Tan, Ebroul Izquierdo

Figure 1 for Content-Adaptive Sketch Portrait Generation by Decompositional Representation Learning
Figure 2 for Content-Adaptive Sketch Portrait Generation by Decompositional Representation Learning
Figure 3 for Content-Adaptive Sketch Portrait Generation by Decompositional Representation Learning
Figure 4 for Content-Adaptive Sketch Portrait Generation by Decompositional Representation Learning

Sketch portrait generation benefits a wide range of applications such as digital entertainment and law enforcement. Although plenty of efforts have been dedicated to this task, several issues still remain unsolved for generating vivid and detail-preserving personal sketch portraits. For example, quite a few artifacts may exist in synthesizing hairpins and glasses, and textural details may be lost in the regions of hair or mustache. Moreover, the generalization ability of current systems is somewhat limited since they usually require elaborately collecting a dictionary of examples or carefully tuning features/components. In this paper, we present a novel representation learning framework that generates an end-to-end photo-sketch mapping through structure and texture decomposition. In the training stage, we first decompose the input face photo into different components according to their representational contents (i.e., structural and textural parts) by using a pre-trained Convolutional Neural Network (CNN). Then, we utilize a Branched Fully Convolutional Neural Network (BFCN) for learning structural and textural representations, respectively. In addition, we design a Sorted Matching Mean Square Error (SM-MSE) metric to measure texture patterns in the loss function. In the stage of sketch rendering, our approach automatically generates structural and textural representations for the input photo and produces the final result via a probabilistic fusion scheme. Extensive experiments on several challenging benchmarks suggest that our approach outperforms example-based synthesis algorithms in terms of both perceptual and objective metrics. In addition, the proposed method also has better generalization ability across dataset without additional training.

* Published in TIP 2017 
Viaarxiv icon

Knowledge-Guided Recurrent Neural Network Learning for Task-Oriented Action Prediction

Jul 15, 2017
Liang Lin, Lili Huang, Tianshui Chen, Yukang Gan, Hui Cheng

Figure 1 for Knowledge-Guided Recurrent Neural Network Learning for Task-Oriented Action Prediction
Figure 2 for Knowledge-Guided Recurrent Neural Network Learning for Task-Oriented Action Prediction
Figure 3 for Knowledge-Guided Recurrent Neural Network Learning for Task-Oriented Action Prediction
Figure 4 for Knowledge-Guided Recurrent Neural Network Learning for Task-Oriented Action Prediction

This paper aims at task-oriented action prediction, i.e., predicting a sequence of actions towards accomplishing a specific task under a certain scene, which is a new problem in computer vision research. The main challenges lie in how to model task-specific knowledge and integrate it in the learning procedure. In this work, we propose to train a recurrent long-short term memory (LSTM) network for handling this problem, i.e., taking a scene image (including pre-located objects) and the specified task as input and recurrently predicting action sequences. However, training such a network usually requires large amounts of annotated samples for covering the semantic space (e.g., diverse action decomposition and ordering). To alleviate this issue, we introduce a temporal And-Or graph (AOG) for task description, which hierarchically represents a task into atomic actions. With this AOG representation, we can produce many valid samples (i.e., action sequences according with common sense) by training another auxiliary LSTM network with a small set of annotated samples. And these generated samples (i.e., task-oriented action sequences) effectively facilitate training the model for task-oriented action prediction. In the experiments, we create a new dataset containing diverse daily tasks and extensively evaluate the effectiveness of our approach.

Viaarxiv icon

Character Proposal Network for Robust Text Extraction

Feb 13, 2016
Shuye Zhang, Mude Lin, Tianshui Chen, Lianwen Jin, Liang Lin

Figure 1 for Character Proposal Network for Robust Text Extraction
Figure 2 for Character Proposal Network for Robust Text Extraction
Figure 3 for Character Proposal Network for Robust Text Extraction
Figure 4 for Character Proposal Network for Robust Text Extraction

Maximally stable extremal regions (MSER), which is a popular method to generate character proposals/candidates, has shown superior performance in scene text detection. However, the pixel-level operation limits its capability for handling some challenging cases (e.g., multiple connected characters, separated parts of one character and non-uniform illumination). To better tackle these cases, we design a character proposal network (CPN) by taking advantage of the high capacity and fast computing of fully convolutional network (FCN). Specifically, the network simultaneously predicts characterness scores and refines the corresponding locations. The characterness scores can be used for proposal ranking to reject non-character proposals and the refining process aims to obtain the more accurate locations. Furthermore, considering the situation that different characters have different aspect ratios, we propose a multi-template strategy, designing a refiner for each aspect ratio. The extensive experiments indicate our method achieves recall rates of 93.88%, 93.60% and 96.46% on ICDAR 2013, SVT and Chinese2k datasets respectively using less than 1000 proposals, demonstrating promising performance of our character proposal network.

Viaarxiv icon

DISC: Deep Image Saliency Computing via Progressive Representation Learning

Dec 10, 2015
Tianshui Chen, Liang Lin, Lingbo Liu, Xiaonan Luo, Xuelong Li

Figure 1 for DISC: Deep Image Saliency Computing via Progressive Representation Learning
Figure 2 for DISC: Deep Image Saliency Computing via Progressive Representation Learning
Figure 3 for DISC: Deep Image Saliency Computing via Progressive Representation Learning
Figure 4 for DISC: Deep Image Saliency Computing via Progressive Representation Learning

Salient object detection increasingly receives attention as an important component or step in several pattern recognition and image processing tasks. Although a variety of powerful saliency models have been intensively proposed, they usually involve heavy feature (or model) engineering based on priors (or assumptions) about the properties of objects and backgrounds. Inspired by the effectiveness of recently developed feature learning, we provide a novel Deep Image Saliency Computing (DISC) framework for fine-grained image saliency computing. In particular, we model the image saliency from both the coarse- and fine-level observations, and utilize the deep convolutional neural network (CNN) to learn the saliency representation in a progressive manner. Specifically, our saliency model is built upon two stacked CNNs. The first CNN generates a coarse-level saliency map by taking the overall image as the input, roughly identifying saliency regions in the global context. Furthermore, we integrate superpixel-based local context information in the first CNN to refine the coarse-level saliency map. Guided by the coarse saliency map, the second CNN focuses on the local context to produce fine-grained and accurate saliency map while preserving object details. For a testing image, the two CNNs collaboratively conduct the saliency computing in one shot. Our DISC framework is capable of uniformly highlighting the objects-of-interest from complex background while preserving well object details. Extensive experiments on several standard benchmarks suggest that DISC outperforms other state-of-the-art methods and it also generalizes well across datasets without additional training. The executable version of DISC is available online: http://vision.sysu.edu.cn/projects/DISC.

* This manuscript is the accepted version for IEEE Transactions on Neural Networks and Learning Systems (T-NNLS), 2015 
Viaarxiv icon