Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Canwen Xu

Automatic Multi-Label Prompting: Simple and Interpretable Few-Shot Classification

Apr 14, 2022

Han Wang, Canwen Xu, Julian McAuley

Figure 1 for Automatic Multi-Label Prompting: Simple and Interpretable Few-Shot Classification

Figure 2 for Automatic Multi-Label Prompting: Simple and Interpretable Few-Shot Classification

Figure 3 for Automatic Multi-Label Prompting: Simple and Interpretable Few-Shot Classification

Figure 4 for Automatic Multi-Label Prompting: Simple and Interpretable Few-Shot Classification

Abstract:Prompt-based learning (i.e., prompting) is an emerging paradigm for exploiting knowledge learned by a pretrained language model. In this paper, we propose Automatic Multi-Label Prompting (AMuLaP), a simple yet effective method to automatically select label mappings for few-shot text classification with prompting. Our method exploits one-to-many label mappings and a statistics-based algorithm to select label mappings given a prompt template. Our experiments demonstrate that AMuLaP achieves competitive performance on the GLUE benchmark without human effort or external resources.

* NAACL 2022 (main conference)

Via

Access Paper or Ask Questions

LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

Mar 11, 2022

Canwen Xu, Daya Guo, Nan Duan, Julian McAuley

Figure 1 for LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

Figure 2 for LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

Figure 3 for LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

Figure 4 for LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

Abstract:In this paper, we propose LaPraDoR, a pretrained dual-tower dense retriever that does not require any supervised data for training. Specifically, we first present Iterative Contrastive Learning (ICoL) that iteratively trains the query and document encoders with a cache mechanism. ICoL not only enlarges the number of negative instances but also keeps representations of cached examples in the same hidden space. We then propose Lexicon-Enhanced Dense Retrieval (LEDR) as a simple yet effective way to enhance dense retrieval with lexical matching. We evaluate LaPraDoR on the recently proposed BEIR benchmark, including 18 datasets of 9 zero-shot text retrieval tasks. Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models, and further analysis reveals the effectiveness of our training strategy and objectives. Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance.

* ACL 2022 (Findings)

Via

Access Paper or Ask Questions

Leashing the Inner Demons: Self-Detoxification for Language Models

Mar 06, 2022

Canwen Xu, Zexue He, Zhankui He, Julian McAuley

Figure 1 for Leashing the Inner Demons: Self-Detoxification for Language Models

Figure 2 for Leashing the Inner Demons: Self-Detoxification for Language Models

Figure 3 for Leashing the Inner Demons: Self-Detoxification for Language Models

Figure 4 for Leashing the Inner Demons: Self-Detoxification for Language Models

Abstract:Language models (LMs) can reproduce (or amplify) toxic language seen during training, which poses a risk to their practical application. In this paper, we conduct extensive experiments to study this phenomenon. We analyze the impact of prompts, decoding strategies and training corpora on the output toxicity. Based on our findings, we propose a simple yet effective method for language models to "detoxify" themselves without an additional large corpus or external discriminator. Compared to a supervised baseline, our proposed method shows better toxicity reduction with good generation quality in the generated content under multiple settings. Warning: some examples shown in the paper may contain uncensored offensive content.

* AAAI 2022

Via

Access Paper or Ask Questions

A Survey on Model Compression for Natural Language Processing

Feb 15, 2022

Canwen Xu, Julian McAuley

Figure 1 for A Survey on Model Compression for Natural Language Processing

Figure 2 for A Survey on Model Compression for Natural Language Processing

Figure 3 for A Survey on Model Compression for Natural Language Processing

Abstract:With recent developments in new architectures like Transformer and pretraining techniques, significant progress has been made in applications of natural language processing (NLP). However, the high energy cost and long inference delay of Transformer is preventing NLP from entering broader scenarios including edge and mobile computing. Efficient NLP research aims to comprehensively consider computation, time and carbon emission for the entire life-cycle of NLP, including data preparation, model training and inference. In this survey, we focus on the inference stage and review the current state of model compression for NLP, including the benchmarks, metrics and methodology. We outline the current obstacles and future research directions.

Via

Access Paper or Ask Questions

A Survey on Dynamic Neural Networks for Natural Language Processing

Feb 15, 2022

Canwen Xu, Julian McAuley

Abstract:Effectively scaling large Transformer models is a main driver of recent advances in natural language processing. Dynamic neural networks, as an emerging research direction, are capable of scaling up neural networks with sub-linear increases in computation and time by dynamically adjusting their computational path based on the input. Dynamic neural networks could be a promising solution to the growing parameter numbers of pretrained language models, allowing both model pretraining with trillions of parameters and faster inference on mobile devices. In this survey, we summarize progress of three types of dynamic neural networks in NLP: skimming, mixture of experts, and early exit. We also highlight current challenges in dynamic neural networks and directions for future research.

Via

Access Paper or Ask Questions

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

Feb 02, 2022

Stephen H. Bach, Victor Sanh, Zheng-Xin Yong, Albert Webson, Colin Raffel, Nihal V. Nayak, Abheesht Sharma, Taewoon Kim, M Saiful Bari, Thibault Fevry(+16 more)

Figure 1 for PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

Figure 2 for PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

Figure 3 for PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

Figure 4 for PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

Abstract:PromptSource is a system for creating, sharing, and using natural language prompts. Prompts are functions that map an example from a dataset to a natural language input and target output. Using prompts to train and query language models is an emerging area in NLP that requires new tools that let users develop and refine these prompts collaboratively. PromptSource addresses the emergent challenges in this new setting with (1) a templating language for defining data-linked prompts, (2) an interface that lets users quickly iterate on prompt development by observing outputs of their prompts on many examples, and (3) a community-driven set of guidelines for contributing new prompts to a common pool. Over 2,000 prompts for roughly 170 datasets are already available in PromptSource. PromptSource is available at https://github.com/bigscience-workshop/promptsource.

Via

Access Paper or Ask Questions

Multitask Prompted Training Enables Zero-Shot Task Generalization

Oct 15, 2021

Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja(+31 more)

Figure 1 for Multitask Prompted Training Enables Zero-Shot Task Generalization

Figure 2 for Multitask Prompted Training Enables Zero-Shot Task Generalization

Figure 3 for Multitask Prompted Training Enables Zero-Shot Task Generalization

Figure 4 for Multitask Prompted Training Enables Zero-Shot Task Generalization

Abstract:Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks. It has been hypothesized that this is a consequence of implicit multitask learning in language model training. Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping general natural language tasks into a human-readable prompted form. We convert a large set of supervised datasets, each with multiple prompts using varying natural language. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot performance on several standard datasets, often outperforming models 16x its size. Further, our approach attains strong performance on a subset of tasks from the BIG-Bench benchmark, outperforming models 6x its size. All prompts and trained models are available at github.com/bigscience-workshop/promptsource/.

* https://github.com/bigscience-workshop/promptsource/

Via

Access Paper or Ask Questions

Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression

Sep 07, 2021

Canwen Xu, Wangchunshu Zhou, Tao Ge, Ke Xu, Julian McAuley, Furu Wei

Abstract:Recent studies on compression of pretrained language models (e.g., BERT) usually use preserved accuracy as the metric for evaluation. In this paper, we propose two new metrics, label loyalty and probability loyalty that measure how closely a compressed model (i.e., student) mimics the original model (i.e., teacher). We also explore the effect of compression with regard to robustness under adversarial attacks. We benchmark quantization, pruning, knowledge distillation and progressive module replacing with loyalty and robustness. By combining multiple compression techniques, we provide a practical strategy to achieve better accuracy, loyalty and robustness.

* Accepted to EMNLP 2021 (main conference)

Via

Access Paper or Ask Questions

Datasets: A Community Library for Natural Language Processing

Sep 07, 2021

Quentin Lhoest, Albert Villanova del Moral, Yacine Jernite, Abhishek Thakur, Patrick von Platen, Suraj Patil, Julien Chaumond, Mariama Drame, Julien Plu, Lewis Tunstall(+22 more)

Figure 1 for Datasets: A Community Library for Natural Language Processing

Figure 2 for Datasets: A Community Library for Natural Language Processing

Figure 3 for Datasets: A Community Library for Natural Language Processing

Abstract:The scale, variety, and quantity of publicly-available NLP datasets has grown rapidly as researchers propose new tasks, larger models, and novel benchmarks. Datasets is a community library for contemporary NLP designed to support this ecosystem. Datasets aims to standardize end-user interfaces, versioning, and documentation, while providing a lightweight front-end that behaves similarly for small datasets as for internet-scale corpora. The design of the library incorporates a distributed, community-driven approach to adding datasets and documenting usage. After a year of development, the library now includes more than 650 unique datasets, has more than 250 contributors, and has helped support a variety of novel cross-dataset research projects and shared tasks. The library is available at https://github.com/huggingface/datasets.

* EMNLP Demo 2021

Via

Access Paper or Ask Questions

Meta Learning for Knowledge Distillation

Jun 08, 2021

Wangchunshu Zhou, Canwen Xu, Julian McAuley

Figure 1 for Meta Learning for Knowledge Distillation

Figure 2 for Meta Learning for Knowledge Distillation

Figure 3 for Meta Learning for Knowledge Distillation

Figure 4 for Meta Learning for Knowledge Distillation

Abstract:We present Meta Learning for Knowledge Distillation (MetaDistil), a simple yet effective alternative to traditional knowledge distillation (KD) methods where the teacher model is fixed during training. We show the teacher network can learn to better transfer knowledge to the student network (i.e., learning to teach) with the feedback from the performance of the distilled student network in a meta learning framework. Moreover, we introduce a pilot update mechanism to improve the alignment between the inner-learner and meta-learner in meta learning algorithms that focus on an improved inner-learner. Experiments on various benchmarks show that MetaDistil can yield significant improvements compared with traditional KD algorithms and is less sensitive to the choice of different student capacity and hyperparameters, facilitating the use of KD on different tasks and models. The code is available at https://github.com/JetRunner/MetaDistil

Via

Access Paper or Ask Questions