Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tianshui Chen

Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition

Sep 20, 2020

Tianshui Chen, Liang Lin, Riquan Chen, Xiaolu Hui, Hefeng Wu

Figure 1 for Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition

Figure 2 for Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition

Figure 3 for Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition

Figure 4 for Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition

Abstract:Recognizing multiple labels of an image is a practical yet challenging task, and remarkable progress has been achieved by searching for semantic regions and exploiting label dependencies. However, current works utilize RNN/LSTM to implicitly capture sequential region/label dependencies, which cannot fully explore mutual interactions among the semantic regions/labels and do not explicitly integrate label co-occurrences. In addition, these works require large amounts of training samples for each category, and they are unable to generalize to novel categories with limited samples. To address these issues, we propose a knowledge-guided graph routing (KGGR) framework, which unifies prior knowledge of statistical label correlations with deep neural networks. The framework exploits prior knowledge to guide adaptive information propagation among different categories to facilitate multi-label analysis and reduce the dependency of training samples. Specifically, it first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence. Then, it introduces the label semantics to guide learning semantic-specific features to initialize the graph, and it exploits a graph propagation network to explore graph node interactions, enabling learning contextualized image feature representations. Moreover, we initialize each graph node with the classifier weights for the corresponding label and apply another propagation network to transfer node messages through the graph. In this way, it can facilitate exploiting the information of correlated labels to help train better classifiers. We conduct extensive experiments on the traditional multi-label image recognition (MLR) and multi-label few-shot learning (ML-FSL) tasks and show that our KGGR framework outperforms the current state-of-the-art methods by sizable margins on the public benchmarks.

* Accepted at TPAMI

Via

Access Paper or Ask Questions

Look into Facial Expression Domain Adaptation: Adversarial Graph Learning and A Fair Evaluation Benchmark

Aug 27, 2020

Tianshui Chen, Tao Pu, Yuan Xie, Hefeng Wu, Lingbo Liu, Liang Lin

Figure 1 for Look into Facial Expression Domain Adaptation: Adversarial Graph Learning and A Fair Evaluation Benchmark

Figure 2 for Look into Facial Expression Domain Adaptation: Adversarial Graph Learning and A Fair Evaluation Benchmark

Figure 3 for Look into Facial Expression Domain Adaptation: Adversarial Graph Learning and A Fair Evaluation Benchmark

Figure 4 for Look into Facial Expression Domain Adaptation: Adversarial Graph Learning and A Fair Evaluation Benchmark

Abstract:To address the problem of data inconsistencies among different facial expression recognition (FER) datasets, many cross-domain FER methods (CD-FERs) have been extensively devised in recent years. Although each declares to achieve superior performance, fair comparisons are lacking due to the inconsistent choices of the source/target datasets and feature extractors. In this work, we first analyze the performance effect caused by these inconsistent choices, and then re-implement some well-performing CD-FER and recently published domain adaptation algorithms. We ensure that all these algorithms adopt the same source datasets and feature extractors for fair CD-FER evaluations. We find that most of the current leading algorithms use adversarial learning to learn holistic domain-invariant features to mitigate domain shifts. However, these algorithms ignore local features, which are more transferable across different datasets and carry more detailed content for fine-grained adaptation. To address these issues, we integrate graph representation propagation with adversarial learning for cross-domain holistic-local feature co-adaptation by developing a novel adversarial graph representation adaptation (AGRA) framework. Specifically, it first builds two graphs to correlate holistic and local regions within each domain and across different domains, respectively. Then, it extracts holistic-local features from the input image and uses learnable per-class statistical distributions to initialize the corresponding graph nodes. Finally, two stacked graph convolution networks (GCNs) are adopted to propagate holistic-local features within each domain to explore their interaction and across different domains for holistic-local feature co-adaptation. We conduct extensive and fair evaluations on several popular benchmarks and show that the proposed AGRA framework outperforms previous state-of-the-art methods.

* Extension of our ACM MM 2020 paper. arXiv admin note: substantial text overlap with arXiv:2008.00859

Via

Access Paper or Ask Questions

Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition

Aug 04, 2020

Yuan Xie, Tianshui Chen, Tao Pu, Hefeng Wu, Liang Lin

Figure 1 for Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition

Figure 2 for Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition

Figure 3 for Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition

Figure 4 for Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition

Abstract:Data inconsistency and bias are inevitable among different facial expression recognition (FER) datasets due to subjective annotating process and different collecting conditions. Recent works resort to adversarial mechanisms that learn domain-invariant features to mitigate domain shift. However, most of these works focus on holistic feature adaptation, and they ignore local features that are more transferable across different datasets. Moreover, local features carry more detailed and discriminative content for expression recognition, and thus integrating local features may enable fine-grained adaptation. In this work, we propose a novel Adversarial Graph Representation Adaptation (AGRA) framework that unifies graph representation propagation with adversarial learning for cross-domain holistic-local feature co-adaptation. To achieve this, we first build a graph to correlate holistic and local regions within each domain and another graph to correlate these regions across different domains. Then, we learn the per-class statistical distribution of each domain and extract holistic-local features from the input image to initialize the corresponding graph nodes. Finally, we introduce two stacked graph convolution networks to propagate holistic-local feature within each domain to explore their interaction and across different domains for holistic-local feature co-adaptation. In this way, the AGRA framework can adaptively learn fine-grained domain-invariant features and thus facilitate cross-domain expression recognition. We conduct extensive and fair experiments on several popular benchmarks and show that the proposed AGRA framework achieves superior performance over previous state-of-the-art methods.

* Accepted at ACM MM 2020

Via

Access Paper or Ask Questions

Fine-Grained Image Captioning with Global-Local Discriminative Objective

Jul 21, 2020

Jie Wu, Tianshui Chen, Hefeng Wu, Zhi Yang, Guangchun Luo, Liang Lin

Figure 1 for Fine-Grained Image Captioning with Global-Local Discriminative Objective

Figure 2 for Fine-Grained Image Captioning with Global-Local Discriminative Objective

Figure 3 for Fine-Grained Image Captioning with Global-Local Discriminative Objective

Figure 4 for Fine-Grained Image Captioning with Global-Local Discriminative Objective

Abstract:Significant progress has been made in recent years in image captioning, an active topic in the fields of vision and language. However, existing methods tend to yield overly general captions and consist of some of the most frequent words/phrases, resulting in inaccurate and indistinguishable descriptions (see Figure 1). This is primarily due to (i) the conservative characteristic of traditional training objectives that drives the model to generate correct but hardly discriminative captions for similar images and (ii) the uneven word distribution of the ground-truth captions, which encourages generating highly frequent words/phrases while suppressing the less frequent but more concrete ones. In this work, we propose a novel global-local discriminative objective that is formulated on top of a reference model to facilitate generating fine-grained descriptive captions. Specifically, from a global perspective, we design a novel global discriminative constraint that pulls the generated sentence to better discern the corresponding image from all others in the entire dataset. From the local perspective, a local discriminative constraint is proposed to increase attention such that it emphasizes the less frequent but more concrete words/phrases, thus facilitating the generation of captions that better describe the visual details of the given images. We evaluate the proposed method on the widely used MS-COCO dataset, where it outperforms the baseline methods by a sizable margin and achieves competitive performance over existing leading approaches. We also conduct self-retrieval experiments to demonstrate the discriminability of the proposed method.

* Accepted by TMM

Via

Access Paper or Ask Questions

Efficient Crowd Counting via Structured Knowledge Transfer

Apr 26, 2020

Lingbo Liu, Jiaqi Chen, Hefeng Wu, Tianshui Chen, Guanbin Li, Liang Lin

Figure 1 for Efficient Crowd Counting via Structured Knowledge Transfer

Figure 2 for Efficient Crowd Counting via Structured Knowledge Transfer

Figure 3 for Efficient Crowd Counting via Structured Knowledge Transfer

Figure 4 for Efficient Crowd Counting via Structured Knowledge Transfer

Abstract:Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications. However, most previous works relied on heavy backbone networks and required prohibitive run-time consumption, which would seriously restrict their deployment scopes and cause poor scalability. To liberate these crowd counting models, we propose a novel Structured Knowledge Transfer (SKT) framework, which fully exploits the structured knowledge of a well-trained teacher network to generate a lightweight but still highly effective student network. Specifically, it is integrated with two complementary transfer modules, including an Intra-Layer Pattern Transfer which sequentially distills the knowledge embedded in layer-wise features of the teacher network to guide feature learning of the student network and an Inter-Layer Relation Transfer which densely distills the cross-layer correlation knowledge of the teacher to regularize the student's feature evolution. In this way, our student network can derive the layer-wise and cross-layer knowledge from the teacher network to learn compact yet effective features. Extensive evaluations on three benchmarks well demonstrate the effectiveness of our SKT for extensive crowd counting models. In particular, only using around $6\%$ of the parameters and computation cost of original models, our distilled VGG-based models obtain at least 6.5$\times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance.

Via

Access Paper or Ask Questions

Knowledge Graph Transfer Network for Few-Shot Recognition

Nov 21, 2019

Riquan Chen, Tianshui Chen, Xiaolu Hui, Hefeng Wu, Guanbin Li, Liang Lin

Figure 1 for Knowledge Graph Transfer Network for Few-Shot Recognition

Figure 2 for Knowledge Graph Transfer Network for Few-Shot Recognition

Figure 3 for Knowledge Graph Transfer Network for Few-Shot Recognition

Figure 4 for Knowledge Graph Transfer Network for Few-Shot Recognition

Abstract:Few-shot learning aims to learn novel categories from very few samples given some base categories with sufficient training samples. The main challenge of this task is the novel categories are prone to dominated by color, texture, shape of the object or background context (namely specificity), which are distinct for the given few training samples but not common for the corresponding categories (see Figure 1). Fortunately, we find that transferring information of the correlated based categories can help learn the novel concepts and thus avoid the novel concept being dominated by the specificity. Besides, incorporating semantic correlations among different categories can effectively regularize this information transfer. In this work, we represent the semantic correlations in the form of structured knowledge graph and integrate this graph into deep neural networks to promote few-shot learning by a novel Knowledge Graph Transfer Network (KGTN). Specifically, by initializing each node with the classifier weight of the corresponding category, a propagation mechanism is learned to adaptively propagate node message through the graph to explore node interaction and transfer classifier information of the base categories to those of the novel ones. Extensive experiments on the ImageNet dataset show significant performance improvement compared with current leading competitors. Furthermore, we construct an ImageNet-6K dataset that covers larger scale categories, i.e, 6,000 categories, and experiments on this dataset further demonstrate the effectiveness of our proposed model.

* accepted by AAAI 2020 as oral paper

Via

Access Paper or Ask Questions

Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition

Aug 20, 2019

Tianshui Chen, Muxin Xu, Xiaolu Hui, Hefeng Wu, Liang Lin

Figure 1 for Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition

Figure 2 for Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition

Figure 3 for Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition

Figure 4 for Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition

Abstract:Recognizing multiple labels of images is a practical and challenging task, and significant progress has been made by searching semantic-aware regions and modeling label dependency. However, current methods cannot locate the semantic regions accurately due to the lack of part-level supervision or semantic guidance. Moreover, they cannot fully explore the mutual interactions among the semantic regions and do not explicitly model the label co-occurrence. To address these issues, we propose a Semantic-Specific Graph Representation Learning (SSGRL) framework that consists of two crucial modules: 1) a semantic decoupling module that incorporates category semantics to guide learning semantic-specific representations and 2) a semantic interaction module that correlates these representations with a graph built on the statistical label co-occurrence and explores their interactions via a graph propagation mechanism. Extensive experiments on public benchmarks show that our SSGRL framework outperforms current state-of-the-art methods by a sizable margin, e.g. with an mAP improvement of 2.5%, 2.6%, 6.7%, and 3.1% on the PASCAL VOC 2007 & 2012, Microsoft-COCO and Visual Genome benchmarks, respectively. Our codes and models are available at https://github.com/HCPLab-SYSU/SSGRL.

* accepted by ICCV 2019

Via

Access Paper or Ask Questions

Semi-Supervised Video Salient Object Detection Using Pseudo-Labels

Aug 12, 2019

Pengxiang Yan, Guanbin Li, Yuan Xie, Zhen Li, Chuan Wang, Tianshui Chen, Liang Lin

Figure 1 for Semi-Supervised Video Salient Object Detection Using Pseudo-Labels

Figure 2 for Semi-Supervised Video Salient Object Detection Using Pseudo-Labels

Figure 3 for Semi-Supervised Video Salient Object Detection Using Pseudo-Labels

Figure 4 for Semi-Supervised Video Salient Object Detection Using Pseudo-Labels

Abstract:Deep learning-based video salient object detection has recently achieved great success with its performance significantly outperforming any other unsupervised methods. However, existing data-driven approaches heavily rely on a large quantity of pixel-wise annotated video frames to deliver such promising results. In this paper, we address the semi-supervised video salient object detection task using pseudo-labels. Specifically, we present an effective video saliency detector that consists of a spatial refinement network and a spatiotemporal module. Based on the same refinement network and motion information in terms of optical flow, we further propose a novel method for generating pixel-level pseudo-labels from sparsely annotated frames. By utilizing the generated pseudo-labels together with a part of manual annotations, our video saliency detector learns spatial and temporal cues for both contrast inference and coherence enhancement, thus producing accurate saliency maps. Experimental results demonstrate that our proposed semi-supervised method even greatly outperforms all the state-of-the-art fully supervised methods across three public benchmarks of VOS, DAVIS, and FBMS.

* Accepted by ICCV 2019

Via

Access Paper or Ask Questions

Knowledge-Embedded Routing Network for Scene Graph Generation

Mar 08, 2019

Tianshui Chen, Weihao Yu, Riquan Chen, Liang Lin

Figure 1 for Knowledge-Embedded Routing Network for Scene Graph Generation

Figure 2 for Knowledge-Embedded Routing Network for Scene Graph Generation

Figure 3 for Knowledge-Embedded Routing Network for Scene Graph Generation

Figure 4 for Knowledge-Embedded Routing Network for Scene Graph Generation

Abstract:To understand a scene in depth not only involves locating/recognizing individual objects, but also requires to infer the relationships and interactions among them. However, since the distribution of real-world relationships is seriously unbalanced, existing methods perform quite poorly for the less frequent relationships. In this work, we find that the statistical correlations between object pairs and their relationships can effectively regularize semantic space and make prediction less ambiguous, and thus well address the unbalanced distribution issue. To achieve this, we incorporate these statistical correlations into deep neural networks to facilitate scene graph generation by developing a Knowledge-Embedded Routing Network. More specifically, we show that the statistical correlations between objects appearing in images and their relationships, can be explicitly represented by a structured knowledge graph, and a routing mechanism is learned to propagate messages through the graph to explore their interactions. Extensive experiments on the large-scale Visual Genome dataset demonstrate the superiority of the proposed method over current state-of-the-art competitors.

* Accepted by CVPR 2019

Via

Access Paper or Ask Questions

Neural Task Planning with And-Or Graph Representations

Aug 25, 2018

Tianshui Chen, Riquan Chen, Lin Nie, Xiaonan Luo, Xiaobai Liu, Liang Lin

Figure 1 for Neural Task Planning with And-Or Graph Representations

Figure 2 for Neural Task Planning with And-Or Graph Representations

Figure 3 for Neural Task Planning with And-Or Graph Representations

Figure 4 for Neural Task Planning with And-Or Graph Representations

Abstract:This paper focuses on semantic task planning, i.e., predicting a sequence of actions toward accomplishing a specific task under a certain scene, which is a new problem in computer vision research. The primary challenges are how to model task-specific knowledge and how to integrate this knowledge into the learning procedure. In this work, we propose training a recurrent long short-term memory (LSTM) network to address this problem, i.e., taking a scene image (including pre-located objects) and the specified task as input and recurrently predicting action sequences. However, training such a network generally requires large numbers of annotated samples to cover the semantic space (e.g., diverse action decomposition and ordering). To overcome this issue, we introduce a knowledge and-or graph (AOG) for task description, which hierarchically represents a task as atomic actions. With this AOG representation, we can produce many valid samples (i.e., action sequences according to common sense) by training another auxiliary LSTM network with a small set of annotated samples. Furthermore, these generated samples (i.e., task-oriented action sequences) effectively facilitate training of the model for semantic task planning. In our experiments, we create a new dataset that contains diverse daily tasks and extensively evaluate the effectiveness of our approach.

* Submitted to TMM, under minor revision. arXiv admin note: text overlap with arXiv:1707.04677

Via

Access Paper or Ask Questions