Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiangyang Xue

Multi-Scale Self-Attention for Text Classification

Dec 02, 2019
Qipeng Guo, Xipeng Qiu, Pengfei Liu, Xiangyang Xue, Zheng Zhang

Figure 1 for Multi-Scale Self-Attention for Text Classification

Figure 2 for Multi-Scale Self-Attention for Text Classification

Figure 3 for Multi-Scale Self-Attention for Text Classification

Figure 4 for Multi-Scale Self-Attention for Text Classification

In this paper, we introduce the prior knowledge, multi-scale structure, into self-attention modules. We propose a Multi-Scale Transformer which uses multi-scale multi-head self-attention to capture features from different scales. Based on the linguistic perspective and the analysis of pre-trained Transformer (BERT) on a huge corpus, we further design a strategy to control the scale distribution for each layer. Results of three different kinds of tasks (21 datasets) show our Multi-Scale Transformer outperforms the standard Transformer consistently and significantly on small and moderate size datasets.

* Accepted in AAAI2020

Via

Access Paper or Ask Questions

Joint Parsing and Generation for Abstractive Summarization

Nov 23, 2019
Kaiqiang Song, Logan Lebanoff, Qipeng Guo, Xipeng Qiu, Xiangyang Xue, Chen Li, Dong Yu, Fei Liu

Figure 1 for Joint Parsing and Generation for Abstractive Summarization

Figure 2 for Joint Parsing and Generation for Abstractive Summarization

Figure 3 for Joint Parsing and Generation for Abstractive Summarization

Figure 4 for Joint Parsing and Generation for Abstractive Summarization

Sentences produced by abstractive summarization systems can be ungrammatical and fail to preserve the original meanings, despite being locally fluent. In this paper we propose to remedy this problem by jointly generating a sentence and its syntactic dependency parse while performing abstraction. If generating a word can introduce an erroneous relation to the summary, the behavior must be discouraged. The proposed method thus holds promise for producing grammatical sentences and encouraging the summary to stay true-to-original. Our contributions of this work are twofold. First, we present a novel neural architecture for abstractive summarization that combines a sequential decoder with a tree-based decoder in a synchronized manner to generate a summary sentence and its syntactic parse. Secondly, we describe a novel human evaluation protocol to assess if, and to what extent, a summary remains true to its original meanings. We evaluate our method on a number of summarization datasets and demonstrate competitive results against strong baselines.

* AAAI 2020 (Main Technical Track)

Via

Access Paper or Ask Questions

Fast Color Constancy with Patch-wise Bright Pixels

Nov 17, 2019
Yiyao Shi, Jian Wang, Xiangyang Xue

Figure 1 for Fast Color Constancy with Patch-wise Bright Pixels

Figure 2 for Fast Color Constancy with Patch-wise Bright Pixels

Figure 3 for Fast Color Constancy with Patch-wise Bright Pixels

Figure 4 for Fast Color Constancy with Patch-wise Bright Pixels

In this paper, a learning-free color constancy algorithm called the Patch-wise Bright Pixels (PBP) is proposed. In this algorithm, an input image is first downsampled and then cut equally into a few patches. After that, according to the modified brightness of each patch, a proper fraction of brightest pixels in the patch is selected. Finally, Gray World (GW)-based methods are applied to the selected bright pixels to estimate the illuminant of the scene. Experiments on NUS $8$-Camera Dataset show that the PBP algorithm outperforms the state-of-the-art learning-free methods as well as a broad range of learning-based ones. In particular, PBP processes a $1080$p image within two milliseconds, which is hundreds of times faster than the existing learning-free ones. Our algorithm offers a potential solution to the full-screen smart phones whose screen-to-body ratio is $100$\%.

* 7 figures and 4 tables

Via

Access Paper or Ask Questions

Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models

Nov 08, 2019
Xisen Jin, Junyi Du, Zhongyu Wei, Xiangyang Xue, Xiang Ren

Figure 1 for Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models

Figure 2 for Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models

Figure 3 for Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models

Figure 4 for Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models

The impressive performance of neural networks on natural language processing tasks attributes to their ability to model complicated word and phrase interactions. Existing flat, word level explanations of predictions hardly unveil how neural networks handle compositional semantics to reach predictions. To tackle the challenge, we study hierarchical explanation of neural network predictions. We identify non-additivity and independent importance attributions within hierarchies as two desirable properties for highlighting word and phrase interactions. We show prior efforts on hierarchical explanations, e.g. contextual decomposition, however, do not satisfy the desired properties mathematically. In this paper, we propose a formal way to quantify the importance of each word or phrase for hierarchical explanations. Following the formulation, we propose Sampling and Contextual Decomposition (SCD) algorithm and Sampling and Occlusion (SOC) algorithm. Human and metrics evaluation on both LSTM models and BERT Transformer models on multiple datasets show that our algorithms outperform prior hierarchical explanation algorithms. Our algorithms apply to hierarchical visualization of compositional semantics, extraction of classification rules and improving human trust of models.

* 12 pages, 9 figures

Via

Access Paper or Ask Questions

Towards Instance-level Image-to-Image Translation

May 05, 2019
Zhiqiang Shen, Mingyang Huang, Jianping Shi, Xiangyang Xue, Thomas Huang

Figure 1 for Towards Instance-level Image-to-Image Translation

Figure 2 for Towards Instance-level Image-to-Image Translation

Figure 3 for Towards Instance-level Image-to-Image Translation

Figure 4 for Towards Instance-level Image-to-Image Translation

Unpaired Image-to-image Translation is a new rising and challenging vision problem that aims to learn a mapping between unaligned image pairs in diverse domains. Recent advances in this field like MUNIT and DRIT mainly focus on disentangling content and style/attribute from a given image first, then directly adopting the global style to guide the model to synthesize new domain images. However, this kind of approaches severely incurs contradiction if the target domain images are content-rich with multiple discrepant objects. In this paper, we present a simple yet effective instance-aware image-to-image translation approach (INIT), which employs the fine-grained local (instance) and global styles to the target image spatially. The proposed INIT exhibits three import advantages: (1) the instance-level objective loss can help learn a more accurate reconstruction and incorporate diverse attributes of objects; (2) the styles used for target domain of local/global areas are from corresponding spatial regions in source domain, which intuitively is a more reasonable mapping; (3) the joint training process can benefit both fine and coarse granularity and incorporates instance information to improve the quality of global translation. We also collect a large-scale benchmark for the new instance-level translation task. We observe that our synthetic images can even benefit real-world vision tasks like generic object detection.

* Accepted to CVPR 2019. Project page: http://zhiqiangshen.com/projects/INIT/index.html

Via

Access Paper or Ask Questions

Question Guided Modular Routing Networks for Visual Question Answering

Apr 17, 2019
Yanze Wu, Qiang Sun, Jianqi Ma, Bin Li, Yanwei Fu, Yao Peng, Xiangyang Xue

Figure 1 for Question Guided Modular Routing Networks for Visual Question Answering

Figure 2 for Question Guided Modular Routing Networks for Visual Question Answering

Figure 3 for Question Guided Modular Routing Networks for Visual Question Answering

Figure 4 for Question Guided Modular Routing Networks for Visual Question Answering

Visual Question Answering (VQA) faces two major challenges: how to better fuse the visual and textual modalities and how to make the VQA model have the reasoning ability to answer more complex questions. In this paper, we address both challenges by proposing the novel Question Guided Modular Routing Networks (QGMRN). QGMRN can fuse the visual and textual modalities in multiple semantic levels which makes the fusion occur in a fine-grained way, it also can learn to reason by routing between the generic modules without additional supervision information or prior knowledge. The proposed QGMRN consists of three sub-networks: visual network, textual network and routing network. The routing network selectively executes each module in the visual network according to the pathway activated by the question features generated by the textual network. Experiments on the CLEVR dataset show that our model can outperform the state-of-the-art. Models and Codes will be released.

Via

Access Paper or Ask Questions

CODA: Counting Objects via Scale-aware Adversarial Density Adaption

Mar 25, 2019
Li Wang, Yongbo Li, Xiangyang Xue

Figure 1 for CODA: Counting Objects via Scale-aware Adversarial Density Adaption

Figure 2 for CODA: Counting Objects via Scale-aware Adversarial Density Adaption

Figure 3 for CODA: Counting Objects via Scale-aware Adversarial Density Adaption

Figure 4 for CODA: Counting Objects via Scale-aware Adversarial Density Adaption

Recent advances in crowd counting have achieved promising results with increasingly complex convolutional neural network designs. However, due to the unpredictable domain shift, generalizing trained model to unseen scenarios is often suboptimal. Inspired by the observation that density maps of different scenarios share similar local structures, we propose a novel adversarial learning approach in this paper, i.e., CODA (\emph{Counting Objects via scale-aware adversarial Density Adaption}). To deal with different object scales and density distributions, we perform adversarial training with pyramid patches of multi-scales from both source- and target-domain. Along with a ranking constraint across levels of the pyramid input, consistent object counts can be produced for different scales. Extensive experiments demonstrate that our network produces much better results on unseen datasets compared with existing counting adaption models. Notably, the performance of our CODA is comparable with the state-of-the-art fully-supervised models that are trained on the target dataset. Further analysis indicates that our density adaption framework can effortlessly extend to scenarios with different objects. \emph{The code is available at https://github.com/Willy0919/CODA.}

* Accepted to ICME2019

Via

Access Paper or Ask Questions

Star-Transformer

Feb 28, 2019
Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, Zheng Zhang

Although Transformer has achieved great successes on many NLP tasks, its heavy structure with fully-connected attention connections leads to dependencies on large training data. In this paper, we present Star-Transformer, a lightweight alternative by careful sparsification. To reduce model complexity, we replace the fully-connected structure with a star-shaped topology, in which every two non-adjacent nodes are connected through a shared relay node. Thus, complexity is reduced from quadratic to linear, while preserving capacity to capture both local composition and long-range dependency. The experiments on four tasks (22 datasets) show that Star-Transformer achieved significant improvements against the standard Transformer for the modestly sized datasets.

* Accepted by NAACL 2019

Via

Access Paper or Ask Questions

Spatial Mixture Models with Learnable Deep Priors for Perceptual Grouping

Feb 07, 2019
Jinyang Yuan, Bin Li, Xiangyang Xue

Figure 1 for Spatial Mixture Models with Learnable Deep Priors for Perceptual Grouping

Figure 2 for Spatial Mixture Models with Learnable Deep Priors for Perceptual Grouping

Figure 3 for Spatial Mixture Models with Learnable Deep Priors for Perceptual Grouping

Figure 4 for Spatial Mixture Models with Learnable Deep Priors for Perceptual Grouping

Humans perceive the seemingly chaotic world in a structured and compositional way with the prerequisite of being able to segregate conceptual entities from the complex visual scenes. The mechanism of grouping basic visual elements of scenes into conceptual entities is termed as perceptual grouping. In this work, we propose a new type of spatial mixture models with learnable priors for perceptual grouping. Different from existing methods, the proposed method disentangles the representation of an object into `shape' and `appearance' which are modeled separately by the mixture weights and the conditional probability distributions. More specifically, each object in the visual scene is modeled by one mixture component, whose mixture weights and the parameter of the conditional probability distribution are generated by two neural networks, respectively. The mixture weights focus on modeling spatial dependencies (i.e., shape) and the conditional probability distributions deal with intra-object variations (i.e., appearance). In addition, the background is separately modeled as a special component complementary to the foreground objects. Our extensive empirical tests on two perceptual grouping datasets demonstrate that the proposed method outperforms the state-of-the-art methods under most experimental configurations. The learned conceptual entities are generalizable to novel visual scenes and insensitive to the diversity of objects.

* AAAI 2019

Via

Access Paper or Ask Questions

A Multi-task Neural Approach for Emotion Attribution, Classification and Summarization

Dec 21, 2018
Guoyun Tu, Yanwei Fu, Boyang Li, Jiarui Gao, Yu-Gang Jiang, Xiangyang Xue

Figure 1 for A Multi-task Neural Approach for Emotion Attribution, Classification and Summarization

Figure 2 for A Multi-task Neural Approach for Emotion Attribution, Classification and Summarization

Figure 3 for A Multi-task Neural Approach for Emotion Attribution, Classification and Summarization

Figure 4 for A Multi-task Neural Approach for Emotion Attribution, Classification and Summarization

Emotional content is a crucial ingredient in user-generated videos. However, the sparsely expressed emotions in the user-generated video cause difficulties to emotions analysis in videos. In this paper, we propose a new neural approach---Bi-stream Emotion Attribution-Classification Network (BEAC-Net) to solve three related emotion analysis tasks: emotion recognition, emotion attribution and emotion-oriented summarization, in an integrated framework. BEAC-Net has two major constituents, an attribution network and a classification network. The attribution network extracts the main emotional segment that classification should focus on in order to mitigate the sparsity problem. The classification network utilizes both the extracted segment and the original video in a bi-stream architecture. We contribute a new dataset for the emotion attribution task with human-annotated ground-truth labels for emotion segments. Experiments on two video datasets demonstrate superior performance of the proposed framework and the complementary nature of the dual classification streams.

Via

Access Paper or Ask Questions