Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hai Zhao

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University

Sentence Representation Learning with Generative Objective rather than Contrastive Objective

Oct 16, 2022

Bohong Wu, Hai Zhao

Figure 1 for Sentence Representation Learning with Generative Objective rather than Contrastive Objective

Figure 2 for Sentence Representation Learning with Generative Objective rather than Contrastive Objective

Figure 3 for Sentence Representation Learning with Generative Objective rather than Contrastive Objective

Figure 4 for Sentence Representation Learning with Generative Objective rather than Contrastive Objective

Abstract:Though offering amazing contextualized token-level representations, current pre-trained language models take less attention on accurately acquiring sentence-level representation during their self-supervised pre-training. However, contrastive objectives which dominate the current sentence representation learning bring little linguistic interpretability and no performance guarantee on downstream semantic tasks. We instead propose a novel generative self-supervised learning objective based on phrase reconstruction. To overcome the drawbacks of previous generative methods, we carefully model intra-sentence structure by breaking down one sentence into pieces of important phrases. Empirical studies show that our generative learning achieves powerful enough performance improvement and outperforms the current state-of-the-art contrastive methods not only on the STS benchmarks, but also on downstream semantic retrieval and reranking tasks. Our code is available at https://github.com/chengzhipanpan/PaSeR.

* Accepted by the Main Conference of EMNLP 2022, long paper. arXiv admin note: substantial text overlap with arXiv:2204.09358

Via

Access Paper or Ask Questions

Towards End-to-End Open Conversational Machine Reading

Oct 13, 2022

Sizhe Zhou, Siru Ouyang, Zhuosheng Zhang, Hai Zhao

Figure 1 for Towards End-to-End Open Conversational Machine Reading

Figure 2 for Towards End-to-End Open Conversational Machine Reading

Figure 3 for Towards End-to-End Open Conversational Machine Reading

Figure 4 for Towards End-to-End Open Conversational Machine Reading

Abstract:In open-retrieval conversational machine reading (OR-CMR) task, machines are required to do multi-turn question answering given dialogue history and a textual knowledge base. Existing works generally utilize two independent modules to approach this problem's two successive sub-tasks: first with a hard-label decision making and second with a question generation aided by various entailment reasoning methods. Such usual cascaded modeling is vulnerable to error propagation and prevents the two sub-tasks from being consistently optimized. In this work, we instead model OR-CMR as a unified text-to-text task in a fully end-to-end style. Experiments on the OR-ShARC dataset show the effectiveness of our proposed end-to-end framework on both sub-tasks by a large margin, achieving new state-of-the-art results. Further ablation studies support that our framework can generalize to different backbone models.

* 10 pages, 2 figures, 10 tables

Via

Access Paper or Ask Questions

Task Compass: Scaling Multi-task Pre-training with Task Prefix

Oct 12, 2022

Zhuosheng Zhang, Shuohang Wang, Yichong Xu, Yuwei Fang, Wenhao Yu, Yang Liu, Hai Zhao, Chenguang Zhu, Michael Zeng

Figure 1 for Task Compass: Scaling Multi-task Pre-training with Task Prefix

Figure 2 for Task Compass: Scaling Multi-task Pre-training with Task Prefix

Figure 3 for Task Compass: Scaling Multi-task Pre-training with Task Prefix

Figure 4 for Task Compass: Scaling Multi-task Pre-training with Task Prefix

Abstract:Leveraging task-aware annotated data as supervised signals to assist with self-supervised learning on large-scale unlabeled data has become a new trend in pre-training language models. Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks. To tackle the challenge, we propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks. We conduct extensive experiments on 40 datasets, which show that our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships. The task relationships reflected by the prefixes align transfer learning performance between tasks. They also suggest directions for data augmentation with complementary tasks, which help our model achieve human-parity results on commonsense reasoning leaderboards. Code is available at https://github.com/cooelf/CompassMTL

* Findings of EMNLP 2022

Via

Access Paper or Ask Questions

Instance Regularization for Discriminative Language Model Pre-training

Oct 11, 2022

Zhuosheng Zhang, Hai Zhao, Ming Zhou

Figure 1 for Instance Regularization for Discriminative Language Model Pre-training

Figure 2 for Instance Regularization for Discriminative Language Model Pre-training

Figure 3 for Instance Regularization for Discriminative Language Model Pre-training

Figure 4 for Instance Regularization for Discriminative Language Model Pre-training

Abstract:Discriminative pre-trained language models (PrLMs) can be generalized as denoising auto-encoders that work with two procedures, ennoising and denoising. First, an ennoising process corrupts texts with arbitrary noising functions to construct training instances. Then, a denoising language model is trained to restore the corrupted tokens. Existing studies have made progress by optimizing independent strategies of either ennoising or denosing. They treat training instances equally throughout the training process, with little attention on the individual contribution of those instances. To model explicit signals of instance contribution, this work proposes to estimate the complexity of restoring the original sentences from corrupted ones in language model pre-training. The estimations involve the corruption degree in the ennoising data construction process and the prediction confidence in the denoising counterpart. Experimental results on natural language understanding and reading comprehension benchmarks show that our approach improves pre-training efficiency, effectiveness, and robustness. Code is publicly available at https://github.com/cooelf/InstanceReg

* Accepted to EMNLP 2022

Via

Access Paper or Ask Questions

Semantic-Preserving Adversarial Code Comprehension

Sep 12, 2022

Yiyang Li, Hongqiu Wu, Hai Zhao

Figure 1 for Semantic-Preserving Adversarial Code Comprehension

Figure 2 for Semantic-Preserving Adversarial Code Comprehension

Figure 3 for Semantic-Preserving Adversarial Code Comprehension

Figure 4 for Semantic-Preserving Adversarial Code Comprehension

Abstract:Based on the tremendous success of pre-trained language models (PrLMs) for source code comprehension tasks, current literature studies either ways to further improve the performance (generalization) of PrLMs, or their robustness against adversarial attacks. However, they have to compromise on the trade-off between the two aspects and none of them consider improving both sides in an effective and practical way. To fill this gap, we propose Semantic-Preserving Adversarial Code Embeddings (SPACE) to find the worst-case semantic-preserving attacks while forcing the model to predict the correct labels under these worst cases. Experiments and analysis demonstrate that SPACE can stay robust against state-of-the-art attacks while boosting the performance of PrLMs for code.

* Accepted by COLING 2022

Via

Access Paper or Ask Questions

Evaluate Confidence Instead of Perplexity for Zero-shot Commonsense Reasoning

Aug 23, 2022

Letian Peng, Zuchao Li, Hai Zhao

Figure 1 for Evaluate Confidence Instead of Perplexity for Zero-shot Commonsense Reasoning

Figure 2 for Evaluate Confidence Instead of Perplexity for Zero-shot Commonsense Reasoning

Figure 3 for Evaluate Confidence Instead of Perplexity for Zero-shot Commonsense Reasoning

Figure 4 for Evaluate Confidence Instead of Perplexity for Zero-shot Commonsense Reasoning

Abstract:Commonsense reasoning is an appealing topic in natural language processing (NLP) as it plays a fundamental role in supporting the human-like actions of NLP systems. With large-scale language models as the backbone, unsupervised pre-training on numerous corpora shows the potential to capture commonsense knowledge. Current pre-trained language model (PLM)-based reasoning follows the traditional practice using perplexity metric. However, commonsense reasoning is more than existing probability evaluation, which is biased by word frequency. This paper reconsiders the nature of commonsense reasoning and proposes a novel commonsense reasoning metric, Non-Replacement Confidence (NRC). In detail, it works on PLMs according to the Replaced Token Detection (RTD) pre-training objective in ELECTRA, in which the corruption detection objective reflects the confidence on contextual integrity that is more relevant to commonsense reasoning than existing probability. Our proposed novel method boosts zero-shot performance on two commonsense reasoning benchmark datasets and further seven commonsense question-answering datasets. Our analysis shows that pre-endowed commonsense knowledge, especially for RTD-based PLMs, is essential in downstream reasoning.

Via

Access Paper or Ask Questions

Learning Better Masking for Better Language Model Pre-training

Aug 23, 2022

Dongjie Yang, Zhuosheng Zhang, Hai Zhao

Figure 1 for Learning Better Masking for Better Language Model Pre-training

Figure 2 for Learning Better Masking for Better Language Model Pre-training

Figure 3 for Learning Better Masking for Better Language Model Pre-training

Figure 4 for Learning Better Masking for Better Language Model Pre-training

Abstract:Masked Language Modeling (MLM) has been widely used as the denoising objective in pre-training language models (PrLMs). Existing PrLMs commonly adopt a random-token masking strategy where a fixed masking ratio is applied and different contents are masked by an equal probability throughout the entire training. However, the model may receive complicated impact from pre-training status, which changes accordingly as training time goes on. In this paper, we show that such time-invariant MLM settings on masking ratio and masked content are unlikely to deliver an optimal outcome, which motivates us to explore the influence of time-variant MLM settings. We propose two scheduled masking approaches that adaptively tune the masking ratio and contents in different training stages, which improves the pre-training efficiency and effectiveness verified on the downstream tasks. Our work is a pioneer study on time-variant masking strategy on ratio and contents and gives a better understanding of how masking ratio and masked content influence the MLM pre-training.

Via

Access Paper or Ask Questions

Adversarial Self-Attention for Language Understanding

Jun 25, 2022

Hongqiu Wu, Hai Zhao

Figure 1 for Adversarial Self-Attention for Language Understanding

Figure 2 for Adversarial Self-Attention for Language Understanding

Figure 3 for Adversarial Self-Attention for Language Understanding

Figure 4 for Adversarial Self-Attention for Language Understanding

Abstract:An ultimate language system aims at the high generalization and robustness when adapting to diverse scenarios. Unfortunately, the recent white hope pre-trained language models (PrLMs) barely escape from stacking excessive parameters to the over-parameterized Transformer architecture to achieve higher performances. This paper thus proposes \textit{Adversarial Self-Attention} mechanism (ASA), which adversarially reconstructs the Transformer attentions and facilitates model training from contaminated model structures, coupled with a fast and simple implementation for better PrLM building. We conduct comprehensive evaluation across a wide range of tasks on both pre-training and fine-tuning stages. For pre-training, ASA unfolds remarkable performance gain compared to regular training for longer periods. For fine-tuning, ASA-empowered models consistently outweigh naive models by a large margin considering both generalization and robustness.

Via

Access Paper or Ask Questions

Generative or Contrastive? Phrase Reconstruction for Better Sentence Representation Learning

Apr 20, 2022

Bohong Wu, Hai Zhao

Figure 1 for Generative or Contrastive? Phrase Reconstruction for Better Sentence Representation Learning

Figure 2 for Generative or Contrastive? Phrase Reconstruction for Better Sentence Representation Learning

Figure 3 for Generative or Contrastive? Phrase Reconstruction for Better Sentence Representation Learning

Figure 4 for Generative or Contrastive? Phrase Reconstruction for Better Sentence Representation Learning

Abstract:Though offering amazing contextualized token-level representations, current pre-trained language models actually take less attention on acquiring sentence-level representation during its self-supervised pre-training. If self-supervised learning can be distinguished into two subcategories, generative and contrastive, then most existing studies show that sentence representation learning may more benefit from the contrastive methods but not the generative methods. However, contrastive learning cannot be well compatible with the common token-level generative self-supervised learning, and does not guarantee good performance on downstream semantic retrieval tasks. Thus, to alleviate such obvious inconveniences, we instead propose a novel generative self-supervised learning objective based on phrase reconstruction. Empirical studies show that our generative learning may yield powerful enough sentence representation and achieve performance in Sentence Textual Similarity (STS) tasks on par with contrastive learning. Further, in terms of unsupervised setting, our generative method outperforms previous state-of-the-art SimCSE on the benchmark of downstream semantic retrieval tasks.

* Preprint

Via

Access Paper or Ask Questions

Back to the Future: Bidirectional Information Decoupling Network for Multi-turn Dialogue Modeling

Apr 18, 2022

Yiyang Li, Hai Zhao, Zhuosheng Zhang

Figure 1 for Back to the Future: Bidirectional Information Decoupling Network for Multi-turn Dialogue Modeling

Figure 2 for Back to the Future: Bidirectional Information Decoupling Network for Multi-turn Dialogue Modeling

Figure 3 for Back to the Future: Bidirectional Information Decoupling Network for Multi-turn Dialogue Modeling

Figure 4 for Back to the Future: Bidirectional Information Decoupling Network for Multi-turn Dialogue Modeling

Abstract:Multi-turn dialogue modeling as a challenging branch of natural language understanding (NLU), aims to build representations for machines to understand human dialogues, which provides a solid foundation for multiple downstream tasks. Recent studies of dialogue modeling commonly employ pre-trained language models (PrLMs) to encode the dialogue history as successive tokens, which is insufficient in capturing the temporal characteristics of dialogues. Therefore, we propose Bidirectional Information Decoupling Network (BiDeN) as a universal dialogue encoder, which explicitly incorporates both the past and future contexts and can be generalized to a wide range of dialogue-related tasks. Experimental results on datasets of different downstream tasks demonstrate the universality and effectiveness of our BiDeN.

Via

Access Paper or Ask Questions