Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xu Sun

ASAT: Adaptively Scaled Adversarial Training in Time Series

Aug 20, 2021

Zhiyuan Zhang, Wei Li, Ruihan Bao, Keiko Harimoto, Yunfang Wu, Xu Sun

Figure 1 for ASAT: Adaptively Scaled Adversarial Training in Time Series

Figure 2 for ASAT: Adaptively Scaled Adversarial Training in Time Series

Figure 3 for ASAT: Adaptively Scaled Adversarial Training in Time Series

Figure 4 for ASAT: Adaptively Scaled Adversarial Training in Time Series

Abstract:Adversarial training is a method for enhancing neural networks to improve the robustness against adversarial examples. Besides the security concerns of potential adversarial examples, adversarial training can also improve the performance of the neural networks, train robust neural networks, and provide interpretability for neural networks. In this work, we take the first step to introduce adversarial training in time series analysis by taking the finance field as an example. Rethinking existing researches of adversarial training, we propose the adaptively scaled adversarial training (ASAT) in time series analysis, by treating data at different time slots with time-dependent importance weights. Experimental results show that the proposed ASAT can improve both the accuracy and the adversarial robustness of neural networks. Besides enhancing neural networks, we also propose the dimension-wise adversarial sensitivity indicator to probe the sensitivities and importance of input dimensions. With the proposed indicator, we can explain the decision bases of black box neural networks.

* Accepted to be appeared in Workshop on Machine Learning in Finance (KDD-MLF) 2021

Via

Access Paper or Ask Questions

O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning

Aug 05, 2021

Fenglin Liu, Xuancheng Ren, Xian Wu, Bang Yang, Shen Ge, Xu Sun

Figure 1 for O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning

Figure 2 for O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning

Figure 3 for O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning

Figure 4 for O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning

Abstract:Video captioning combines video understanding and language generation. Different from image captioning that describes a static image with details of almost every object, video captioning usually considers a sequence of frames and biases towards focused objects, e.g., the objects that stay in focus regardless of the changing background. Therefore, detecting and properly accommodating focused objects is critical in video captioning. To enforce the description of focused objects and achieve controllable video captioning, we propose an Object-Oriented Non-Autoregressive approach (O2NA), which performs caption generation in three steps: 1) identify the focused objects and predict their locations in the target caption; 2) generate the related attribute words and relation words of these focused objects to form a draft caption; and 3) combine video information to refine the draft caption to a fluent final caption. Since the focused objects are generated and located ahead of other words, it is difficult to apply the word-by-word autoregressive generation process; instead, we adopt a non-autoregressive approach. The experiments on two benchmark datasets, i.e., MSR-VTT and MSVD, demonstrate the effectiveness of O2NA, which achieves results competitive with the state-of-the-arts but with both higher diversity and higher inference speed.

* Accepted by Findings of ACL 2021

Via

Access Paper or Ask Questions

Contrastive Attention for Automatic Chest X-ray Report Generation

Jun 13, 2021

Fenglin Liu, Changchang Yin, Xian Wu, Shen Ge, Ping Zhang, Xu Sun

Figure 1 for Contrastive Attention for Automatic Chest X-ray Report Generation

Figure 2 for Contrastive Attention for Automatic Chest X-ray Report Generation

Figure 3 for Contrastive Attention for Automatic Chest X-ray Report Generation

Figure 4 for Contrastive Attention for Automatic Chest X-ray Report Generation

Abstract:Recently, chest X-ray report generation, which aims to automatically generate descriptions of given chest X-ray images, has received growing research interests. The key challenge of chest X-ray report generation is to accurately capture and describe the abnormal regions. In most cases, the normal regions dominate the entire chest X-ray image, and the corresponding descriptions of these normal regions dominate the final report. Due to such data bias, learning-based models may fail to attend to abnormal regions. In this work, to effectively capture and describe abnormal regions, we propose the Contrastive Attention (CA) model. Instead of solely focusing on the current input image, the CA model compares the current input image with normal images to distill the contrastive information. The acquired contrastive information can better represent the visual features of abnormal regions. According to the experiments on the public IU-X-ray and MIMIC-CXR datasets, incorporating our CA into several existing models can boost their performance across most metrics. In addition, according to the analysis, the CA model can help existing models better attend to the abnormal regions and provide more accurate descriptions which are crucial for an interpretable diagnosis. Specifically, we achieve the state-of-the-art results on the two public datasets.

* Appear in Findings of ACL 2021 (The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021))

Via

Access Paper or Ask Questions

Learning Relation Alignment for Calibrated Cross-modal Retrieval

Jun 01, 2021

Shuhuai Ren, Junyang Lin, Guangxiang Zhao, Rui Men, An Yang, Jingren Zhou, Xu Sun, Hongxia Yang

Figure 1 for Learning Relation Alignment for Calibrated Cross-modal Retrieval

Figure 2 for Learning Relation Alignment for Calibrated Cross-modal Retrieval

Figure 3 for Learning Relation Alignment for Calibrated Cross-modal Retrieval

Figure 4 for Learning Relation Alignment for Calibrated Cross-modal Retrieval

Abstract:Despite the achievements of large-scale multimodal pre-training approaches, cross-modal retrieval, e.g., image-text retrieval, remains a challenging task. To bridge the semantic gap between the two modalities, previous studies mainly focus on word-region alignment at the object level, lacking the matching between the linguistic relation among the words and the visual relation among the regions. The neglect of such relation consistency impairs the contextualized representation of image-text pairs and hinders the model performance and the interpretability. In this paper, we first propose a novel metric, Intra-modal Self-attention Distance (ISD), to quantify the relation consistency by measuring the semantic distance between linguistic and visual relations. In response, we present Inter-modal Alignment on Intra-modal Self-attentions (IAIS), a regularized training method to optimize the ISD and calibrate intra-modal self-attentions from the two modalities mutually via inter-modal alignment. The IAIS regularizer boosts the performance of prevailing models on Flickr30k and MS COCO datasets by a considerable margin, which demonstrates the superiority of our approach.

* Accepted by ACL-IJCNLP 2021 main conference (Long Paper)

Via

Access Paper or Ask Questions

Alleviating the Knowledge-Language Inconsistency: A Study for Deep Commonsense Knowledge

May 31, 2021

Yi Zhang, Lei Li, Yunfang Wu, Qi Su, Xu Sun

Figure 1 for Alleviating the Knowledge-Language Inconsistency: A Study for Deep Commonsense Knowledge

Figure 2 for Alleviating the Knowledge-Language Inconsistency: A Study for Deep Commonsense Knowledge

Figure 3 for Alleviating the Knowledge-Language Inconsistency: A Study for Deep Commonsense Knowledge

Figure 4 for Alleviating the Knowledge-Language Inconsistency: A Study for Deep Commonsense Knowledge

Abstract:Knowledge facts are typically represented by relational triples, while we observe that some commonsense facts are represented by the triples whose forms are inconsistent with the expression of language. This inconsistency puts forward a challenge for pre-trained language models to deal with these commonsense knowledge facts. In this paper, we term such knowledge as deep commonsense knowledge and conduct extensive exploratory experiments on it. We show that deep commonsense knowledge occupies a significant part of commonsense knowledge while conventional methods fail to capture it effectively. We further propose a novel method to mine the deep commonsense knowledge distributed in sentences, alleviating the reliance of conventional methods on the triple representation form of knowledge. Experiments demonstrate that the proposal significantly improves the performance in mining deep commonsense knowledge.

Via

Access Paper or Ask Questions

Rethinking Skip Connection with Layer Normalization in Transformers and ResNets

May 15, 2021

Fenglin Liu, Xuancheng Ren, Zhiyuan Zhang, Xu Sun, Yuexian Zou

Figure 1 for Rethinking Skip Connection with Layer Normalization in Transformers and ResNets

Figure 2 for Rethinking Skip Connection with Layer Normalization in Transformers and ResNets

Figure 3 for Rethinking Skip Connection with Layer Normalization in Transformers and ResNets

Figure 4 for Rethinking Skip Connection with Layer Normalization in Transformers and ResNets

Abstract:Skip connection, is a widely-used technique to improve the performance and the convergence of deep neural networks, which is believed to relieve the difficulty in optimization due to non-linearity by propagating a linear component through the neural network layers. However, from another point of view, it can also be seen as a modulating mechanism between the input and the output, with the input scaled by a pre-defined value one. In this work, we investigate how the scale factors in the effectiveness of the skip connection and reveal that a trivial adjustment of the scale will lead to spurious gradient exploding or vanishing in line with the deepness of the models, which could be addressed by normalization, in particular, layer normalization, which induces consistent improvements over the plain skip connection. Inspired by the findings, we further propose to adaptively adjust the scale of the input by recursively applying skip connection with layer normalization, which promotes the performance substantially and generalizes well across diverse tasks including both machine translation and image classification datasets.

* Accepted by COLING2020 (The 28th International Conference on Computational Linguistics (COLING 2020))

Via

Access Paper or Ask Questions

Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

Mar 29, 2021

Wenkai Yang, Lei Li, Zhiyuan Zhang, Xuancheng Ren, Xu Sun, Bin He

Figure 1 for Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

Figure 2 for Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

Figure 3 for Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

Figure 4 for Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

Abstract:Recent studies have revealed a security threat to natural language processing (NLP) models, called the Backdoor Attack. Victim models can maintain competitive performance on clean samples while behaving abnormally on samples with a specific trigger word inserted. Previous backdoor attacking methods usually assume that attackers have a certain degree of data knowledge, either the dataset which users would use or proxy datasets for a similar task, for implementing the data poisoning procedure. However, in this paper, we find that it is possible to hack the model in a data-free way by modifying one single word embedding vector, with almost no accuracy sacrificed on clean samples. Experimental results on sentiment analysis and sentence-pair classification tasks show that our method is more efficient and stealthier. We hope this work can raise the awareness of such a critical security risk hidden in the embedding layers of NLP models. Our code is available at https://github.com/lancopku/Embedding-Poisoning.

* NAACL-HLT 2021, Long Paper

Via

Access Paper or Ask Questions

Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation

Feb 22, 2021

Shaoxiong Feng, Xuancheng Ren, Kan Li, Xu Sun

Figure 1 for Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation

Figure 2 for Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation

Figure 3 for Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation

Figure 4 for Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation

Abstract:Neural dialogue models suffer from low-quality responses when interacted in practice, demonstrating difficulty in generalization beyond training data. Recently, knowledge distillation has been used to successfully regularize the student by transferring knowledge from the teacher. However, the teacher and the student are trained on the same dataset and tend to learn similar feature representations, whereas the most general knowledge should be found through differences. The finding of general knowledge is further hindered by the unidirectional distillation, as the student should obey the teacher and may discard some knowledge that is truly general but refuted by the teacher. To this end, we propose a novel training framework, where the learning of general knowledge is more in line with the idea of reaching consensus, i.e., finding common knowledge that is beneficial to different yet all datasets through diversified learning partners. Concretely, the training task is divided into a group of subtasks with the same number of students. Each student assigned to one subtask not only is optimized on the allocated subtask but also imitates multi-view feature representation aggregated from other students (i.e., student peers), which induces students to capture common knowledge among different subtasks and alleviates the over-fitting of students on the allocated subtasks. To further enhance generalization, we extend the unidirectional distillation to the bidirectional distillation that encourages the student and its student peers to co-evolve by exchanging complementary knowledge with each other. Empirical results and analysis demonstrate that our training framework effectively improves the model generalization without sacrificing training efficiency.

* Accepted by AAAI 2021; Camera ready version

Via

Access Paper or Ask Questions

Accelerating Pre-trained Language Models via Calibrated Cascade

Dec 29, 2020

Lei Li, Yankai Lin, Shuhuai Ren, Deli Chen, Xuancheng Ren, Peng Li, Jie Zhou, Xu Sun

Figure 1 for Accelerating Pre-trained Language Models via Calibrated Cascade

Figure 2 for Accelerating Pre-trained Language Models via Calibrated Cascade

Figure 3 for Accelerating Pre-trained Language Models via Calibrated Cascade

Abstract:Dynamic early exiting aims to accelerate pre-trained language models' (PLMs) inference by exiting in shallow layer without passing through the entire model. In this paper, we analyze the working mechanism of dynamic early exiting and find it cannot achieve a satisfying trade-off between inference speed and performance. On one hand, the PLMs' representations in shallow layers are not sufficient for accurate prediction. One the other hand, the internal off-ramps cannot provide reliable exiting decisions. To remedy this, we instead propose CascadeBERT, which dynamically selects a proper-sized, complete model in a cascading manner. To obtain more reliable model selection, we further devise a difficulty-aware objective, encouraging the model output class probability to reflect the real difficulty of each instance. Extensive experimental results demonstrate the superiority of our proposal over strong baseline models of PLMs' acceleration including both dynamic early exiting and knowledge distillation methods.

* Preprint

Via

Access Paper or Ask Questions

Learning Robust Representation for Clustering through Locality Preserving Variational Discriminative Network

Dec 25, 2020

Ruixuan Luo, Wei Li, Zhiyuan Zhang, Ruihan Bao, Keiko Harimoto, Xu Sun

Figure 1 for Learning Robust Representation for Clustering through Locality Preserving Variational Discriminative Network

Figure 2 for Learning Robust Representation for Clustering through Locality Preserving Variational Discriminative Network

Figure 3 for Learning Robust Representation for Clustering through Locality Preserving Variational Discriminative Network

Figure 4 for Learning Robust Representation for Clustering through Locality Preserving Variational Discriminative Network

Abstract:Clustering is one of the fundamental problems in unsupervised learning. Recent deep learning based methods focus on learning clustering oriented representations. Among those methods, Variational Deep Embedding achieves great success in various clustering tasks by specifying a Gaussian Mixture prior to the latent space. However, VaDE suffers from two problems: 1) it is fragile to the input noise; 2) it ignores the locality information between the neighboring data points. In this paper, we propose a joint learning framework that improves VaDE with a robust embedding discriminator and a local structure constraint, which are both helpful to improve the robustness of our model. Experiment results on various vision and textual datasets demonstrate that our method outperforms the state-of-the-art baseline models in all metrics. Further detailed analysis shows that our proposed model is very robust to the adversarial inputs, which is a desirable property for practical applications.

* Accepted by AAAI RSEML 2021 workshop

Via

Access Paper or Ask Questions