Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jun Guo

Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization

Oct 11, 2020
Jiyang Xie, Zhanyu Ma, Guoqiang Zhang, Jing-Hao Xue, Zheng-Hua Tan, Jun Guo

Figure 1 for Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization

Figure 2 for Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization

Figure 3 for Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization

Figure 4 for Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization

Due to lack of data, overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs). In this paper, we propose advanced dropout, a model-free methodology, to mitigate overfitting and improve the performance of DNNs. The advanced dropout technique applies a model-free and easily implemented distribution with a parametric prior, and adaptively adjusts dropout rate. Specifically, the distribution parameters are optimized by stochastic gradient variational Bayes (SGVB) inference in order to carry out an end-to-end training of DNNs. We evaluate the effectiveness of the advanced dropout against nine dropout techniques on five widely used datasets in computer vision. The advanced dropout outperforms all the referred techniques by 0.83% on average for all the datasets. An ablation study is conducted to analyze the effectiveness of each component. Meanwhile, convergence of dropout rate and ability to prevent overfitting are discussed in terms of classification performance. Moreover, we extend the application of the advanced dropout to uncertainty inference and network pruning, and we find that the advanced dropout is superior to the corresponding referred methods. The advanced dropout improves classification accuracies by 4% in uncertainty inference and by 0.2% and 0.5% when pruning more than 90% of nodes and 99.8% of parameters, respectively.

Via

Access Paper or Ask Questions

Transformer-GCRF: Recovering Chinese Dropped Pronouns with General Conditional Random Fields

Oct 07, 2020
Jingxuan Yang, Kerui Xu, Jun Xu, Si Li, Sheng Gao, Jun Guo, Ji-Rong Wen, Nianwen Xue

Figure 1 for Transformer-GCRF: Recovering Chinese Dropped Pronouns with General Conditional Random Fields

Figure 2 for Transformer-GCRF: Recovering Chinese Dropped Pronouns with General Conditional Random Fields

Figure 3 for Transformer-GCRF: Recovering Chinese Dropped Pronouns with General Conditional Random Fields

Figure 4 for Transformer-GCRF: Recovering Chinese Dropped Pronouns with General Conditional Random Fields

Pronouns are often dropped in Chinese conversations and recovering the dropped pronouns is important for NLP applications such as Machine Translation. Existing approaches usually formulate this as a sequence labeling task of predicting whether there is a dropped pronoun before each token and its type. Each utterance is considered to be a sequence and labeled independently. Although these approaches have shown promise, labeling each utterance independently ignores the dependencies between pronouns in neighboring utterances. Modeling these dependencies is critical to improving the performance of dropped pronoun recovery. In this paper, we present a novel framework that combines the strength of Transformer network with General Conditional Random Fields (GCRF) to model the dependencies between pronouns in neighboring utterances. Results on three Chinese conversation datasets show that the Transformer-GCRF model outperforms the state-of-the-art dropped pronoun recovery models. Exploratory analysis also demonstrates that the GCRF did help to capture the dependencies between pronouns in neighboring utterances, thus contributes to the performance improvements.

* Accept as EMNLP-findings 2020

Via

Access Paper or Ask Questions

SSKD: Self-Supervised Knowledge Distillation for Cross Domain Adaptive Person Re-Identification

Sep 13, 2020
Junhui Yin, Jiayan Qiu, Siqing Zhang, Zhanyu Ma, Jun Guo

Figure 1 for SSKD: Self-Supervised Knowledge Distillation for Cross Domain Adaptive Person Re-Identification

Figure 2 for SSKD: Self-Supervised Knowledge Distillation for Cross Domain Adaptive Person Re-Identification

Figure 3 for SSKD: Self-Supervised Knowledge Distillation for Cross Domain Adaptive Person Re-Identification

Figure 4 for SSKD: Self-Supervised Knowledge Distillation for Cross Domain Adaptive Person Re-Identification

Domain adaptive person re-identification (re-ID) is a challenging task due to the large discrepancy between the source domain and the target domain. To reduce the domain discrepancy, existing methods mainly attempt to generate pseudo labels for unlabeled target images by clustering algorithms. However, clustering methods tend to bring noisy labels and the rich fine-grained details in unlabeled images are not sufficiently exploited. In this paper, we seek to improve the quality of labels by capturing feature representation from multiple augmented views of unlabeled images. To this end, we propose a Self-Supervised Knowledge Distillation (SSKD) technique containing two modules, the identity learning and the soft label learning. Identity learning explores the relationship between unlabeled samples and predicts their one-hot labels by clustering to give exact information for confidently distinguished images. Soft label learning regards labels as a distribution and induces an image to be associated with several related classes for training peer network in a self-supervised manner, where the slowly evolving network is a core to obtain soft labels as a gentle constraint for reliable images. Finally, the two modules can resist label noise for re-ID by enhancing each other and systematically integrating label information from unlabeled images. Extensive experiments on several adaptation tasks demonstrate that the proposed method outperforms the current state-of-the-art approaches by large margins.

Via

Access Paper or Ask Questions

ReMarNet: Conjoint Relation and Margin Learning for Small-Sample Image Classification

Jun 27, 2020
Xiaoxu Li, Liyun Yu, Xiaochen Yang, Zhanyu Ma, Jing-Hao Xue, Jie Cao, Jun Guo

Figure 1 for ReMarNet: Conjoint Relation and Margin Learning for Small-Sample Image Classification

Figure 2 for ReMarNet: Conjoint Relation and Margin Learning for Small-Sample Image Classification

Figure 3 for ReMarNet: Conjoint Relation and Margin Learning for Small-Sample Image Classification

Figure 4 for ReMarNet: Conjoint Relation and Margin Learning for Small-Sample Image Classification

Despite achieving state-of-the-art performance, deep learning methods generally require a large amount of labeled data during training and may suffer from overfitting when the sample size is small. To ensure good generalizability of deep networks under small sample sizes, learning discriminative features is crucial. To this end, several loss functions have been proposed to encourage large intra-class compactness and inter-class separability. In this paper, we propose to enhance the discriminative power of features from a new perspective by introducing a novel neural network termed Relation-and-Margin learning Network (ReMarNet). Our method assembles two networks of different backbones so as to learn the features that can perform excellently in both of the aforementioned two classification mechanisms. Specifically, a relation network is used to learn the features that can support classification based on the similarity between a sample and a class prototype; at the meantime, a fully connected network with the cross entropy loss is used for classification via the decision boundary. Experiments on four image datasets demonstrate that our approach is effective in learning discriminative features from a small set of labeled samples and achieves competitive performance against state-of-the-art methods. Codes are available at https://github.com/liyunyu08/ReMarNet.

* IEEE TCSVT 2020

Via

Access Paper or Ask Questions

Attention-guided Context Feature Pyramid Network for Object Detection

May 23, 2020
Junxu Cao, Qi Chen, Jun Guo, Ruichao Shi

Figure 1 for Attention-guided Context Feature Pyramid Network for Object Detection

Figure 2 for Attention-guided Context Feature Pyramid Network for Object Detection

Figure 3 for Attention-guided Context Feature Pyramid Network for Object Detection

Figure 4 for Attention-guided Context Feature Pyramid Network for Object Detection

For object detection, how to address the contradictory requirement between feature map resolution and receptive field on high-resolution inputs still remains an open question. In this paper, to tackle this issue, we build a novel architecture, called Attention-guided Context Feature Pyramid Network (AC-FPN), that exploits discriminative information from various large receptive fields via integrating attention-guided multi-path features. The model contains two modules. The first one is Context Extraction Module (CEM) that explores large contextual information from multiple receptive fields. As redundant contextual relations may mislead localization and recognition, we also design the second module named Attention-guided Module (AM), which can adaptively capture the salient dependencies over objects by using the attention mechanism. AM consists of two sub-modules, i.e., Context Attention Module (CxAM) and Content Attention Module (CnAM), which focus on capturing discriminative semantics and locating precise positions, respectively. Most importantly, our AC-FPN can be readily plugged into existing FPN-based models. Extensive experiments on object detection and instance segmentation show that existing models with our proposed CEM and AM significantly surpass their counterparts without them, and our model successfully obtains state-of-the-art results. We have released the source code at https://github.com/Caojunxu/AC-FPN.

Via

Access Paper or Ask Questions

OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax Layer

Apr 20, 2020
Xiaoxu Li, Dongliang Chang, Zhanyu Ma, Zheng-Hua Tan, Jing-Hao Xue, Jie Cao, Jingyi Yu, Jun Guo

Figure 1 for OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax Layer

Figure 2 for OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax Layer

Figure 3 for OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax Layer

Figure 4 for OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax Layer

A deep neural network of multiple nonlinear layers forms a large function space, which can easily lead to overfitting when it encounters small-sample data. To mitigate overfitting in small-sample classification, learning more discriminative features from small-sample data is becoming a new trend. To this end, this paper aims to find a subspace of neural networks that can facilitate a large decision margin. Specifically, we propose the Orthogonal Softmax Layer (OSL), which makes the weight vectors in the classification layer remain orthogonal during both the training and test processes. The Rademacher complexity of a network using the OSL is only $\frac{1}{K}$, where $K$ is the number of classes, of that of a network using the fully connected classification layer, leading to a tighter generalization error bound. Experimental results demonstrate that the proposed OSL has better performance than the methods used for comparison on four small-sample benchmark datasets, as well as its applicability to large-sample datasets. Codes are available at: https://github.com/dongliangchang/OSLNet.

* TIP 2020. Code available at https://github.com/dongliangchang/OSLNet

Via

Access Paper or Ask Questions

Dual-attention Guided Dropblock Module for Weakly Supervised Object Localization

Mar 19, 2020
Junhui Yin, Siqing Zhang, Dongliang Chang, Zhanyu Ma, Jun Guo

Figure 1 for Dual-attention Guided Dropblock Module for Weakly Supervised Object Localization

Figure 2 for Dual-attention Guided Dropblock Module for Weakly Supervised Object Localization

Figure 3 for Dual-attention Guided Dropblock Module for Weakly Supervised Object Localization

Figure 4 for Dual-attention Guided Dropblock Module for Weakly Supervised Object Localization

In this paper, we present a dual-attention guided dropblock module, and aim at learning the informative and complementary visual features for weakly supervised object localization (WSOL). The attention mechanism is extended to the task of WSOL, and design two types of attention modules to learn the discriminative features for better feature representations. Based on two types of attention mechanism, we propose a channel attention guided dropout (CAGD) and a spatial attention guided dropblock (SAGD). The CAGD ranks channel attention by a measure of importance and consider the top-k largest magnitude attentions as important ones. The SAGD can not only completely remove the information by erasing the contiguous regions of feature maps rather than individual pixels, but also simply distinguish the foreground objects and background regions to alleviate the attention misdirection. Extensive experiments demonstrate that the proposed method achieves new state-of-the-art localization accuracy on a challenging dataset.

* Technical Reports

Via

Access Paper or Ask Questions

Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

Mar 12, 2020
Fangyi Zhu, Jenq-Neng Hwang, Zhanyu Ma, Jun Guo

Figure 1 for Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

Figure 2 for Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

Figure 3 for Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

Figure 4 for Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

Traditional video captioning requests a holistic description of the video, yet the detailed descriptions of the specific objects may not be available. Besides, most methods adopt frame-level inter-object features and ambiguous descriptions during training, which is difficult for learning the vision-language relationships. Without associating the transition trajectories, these image-based methods cannot understand the activities with visual features. We propose a novel task, named object-oriented video captioning, which focuses on understanding the videos in object-level. We re-annotate the object-sentence pairs for more effective cross-modal learning. Thereafter, we design the video-based object-oriented video captioning (OVC)-Net to reliably analyze the activities along time with only visual features and capture the vision-language connections under small datasets stably. To demonstrate the effectiveness, we evaluate the method on the new dataset and compare it with the state-of-the-arts for video captioning. From the experimental results, the OVC-Net exhibits the ability of precisely describing the concurrent objects and their activities in details.

Via

Access Paper or Ask Questions

Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches

Mar 10, 2020
Ruoyi Du, Dongliang Chang, Ayan Kumar Bhunia, Jiyang Xie, Yi-Zhe Song, Zhanyu Ma, Jun Guo

Figure 1 for Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches

Figure 2 for Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches

Figure 3 for Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches

Figure 4 for Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches

Fine-grained visual classification (FGVC) is much more challenging than traditional classification tasks due to the inherently subtle intra-class object variations. Recent works mainly tackle this problem by focusing on how to locate the most discriminative parts, more complementary parts, and parts of various granularities. However, less effort has been placed to which granularities are the most discriminative and how to fuse information cross multi-granularity. In this work, we propose a novel framework for fine-grained visual classification to tackle these problems. In particular, we propose: (i) a novel progressive training strategy that adds new layers in each training step to exploit information based on the smaller granularity information found at the last step and the previous stage. (ii) a simple jigsaw puzzle generator to form images contain information of different granularity levels. We obtain state-of-the-art performances on several standard FGVC benchmark datasets, where the proposed method consistently outperforms existing methods or delivers competitive results. The code will be available at https://github.com/RuoyiDu/PMG-Progressive-Multi-Granularity-Training.

Via

Access Paper or Ask Questions

Mind the Gap: Enlarging the Domain Gap in Open Set Domain Adaptation

Mar 10, 2020
Dongliang Chang, Aneeshan Sain, Zhanyu Ma, Yi-Zhe Song, Jun Guo

Figure 1 for Mind the Gap: Enlarging the Domain Gap in Open Set Domain Adaptation

Figure 2 for Mind the Gap: Enlarging the Domain Gap in Open Set Domain Adaptation

Figure 3 for Mind the Gap: Enlarging the Domain Gap in Open Set Domain Adaptation

Figure 4 for Mind the Gap: Enlarging the Domain Gap in Open Set Domain Adaptation

Unsupervised domain adaptation aims to leverage labeled data from a source domain to learn a classifier for an unlabeled target domain. Among its many variants, open set domain adaptation (OSDA) is perhaps the most challenging, as it further assumes the presence of unknown classes in the target domain. In this paper, we study OSDA with a particular focus on enriching its ability to traverse across larger domain gaps. Firstly, we show that existing state-of-the-art methods suffer a considerable performance drop in the presence of larger domain gaps, especially on a new dataset (PACS) that we re-purposed for OSDA. We then propose a novel framework to specifically address the larger domain gaps. The key insight lies with how we exploit the mutually beneficial information between two networks; (a) to separate samples of known and unknown classes, (b) to maximize the domain confusion between source and target domain without the influence of unknown samples. It follows that (a) and (b) will mutually supervise each other and alternate until convergence. Extensive experiments are conducted on Office-31, Office-Home, and PACS datasets, demonstrating the superiority of our method in comparison to other state-of-the-arts. Code available at https://github.com/dongliangchang/Mutual-to-Separate/

Via

Access Paper or Ask Questions