Alert button
Picture for Bryan McCann

Bryan McCann

Alert button

Joint Energy-based Model Training for Better Calibrated Natural Language Understanding Models

Jan 18, 2021
Tianxing He, Bryan McCann, Caiming Xiong, Ehsan Hosseini-Asl

Figure 1 for Joint Energy-based Model Training for Better Calibrated Natural Language Understanding Models
Figure 2 for Joint Energy-based Model Training for Better Calibrated Natural Language Understanding Models
Figure 3 for Joint Energy-based Model Training for Better Calibrated Natural Language Understanding Models
Figure 4 for Joint Energy-based Model Training for Better Calibrated Natural Language Understanding Models

In this work, we explore joint energy-based model (EBM) training during the finetuning of pretrained text encoders (e.g., Roberta) for natural language understanding (NLU) tasks. Our experiments show that EBM training can help the model reach a better calibration that is competitive to strong baselines, with little or no loss in accuracy. We discuss three variants of energy functions (namely scalar, hidden, and sharp-hidden) that can be defined on top of a text encoder, and compare them in experiments. Due to the discreteness of text data, we adopt noise contrastive estimation (NCE) to train the energy-based model. To make NCE training more effective, we train an auto-regressive noise model with the masked language model (MLM) objective.

* EACL 2021  
Viaarxiv icon

CTRLsum: Towards Generic Controllable Text Summarization

Dec 08, 2020
Junxian He, Wojciech Kryściński, Bryan McCann, Nazneen Rajani, Caiming Xiong

Figure 1 for CTRLsum: Towards Generic Controllable Text Summarization
Figure 2 for CTRLsum: Towards Generic Controllable Text Summarization
Figure 3 for CTRLsum: Towards Generic Controllable Text Summarization
Figure 4 for CTRLsum: Towards Generic Controllable Text Summarization

Current summarization systems yield generic summaries that are disconnected from users' preferences and expectations. To address this limitation, we present CTRLsum, a novel framework for controllable summarization. Our approach enables users to control multiple aspects of generated summaries by interacting with the summarization system through textual input in the form of a set of keywords or descriptive prompts. Using a single unified model, CTRLsum is able to achieve a broad scope of summary manipulation at inference time without requiring additional human annotations or pre-defining a set of control aspects during training. We quantitatively demonstrate the effectiveness of our approach on three domains of summarization datasets and five control aspects: 1) entity-centric and 2) length-controllable summarization, 3) contribution summarization on scientific papers, 4) invention purpose summarization on patent filings, and 5) question-guided summarization on news articles in a reading comprehension setting. Moreover, when used in a standard, uncontrolled summarization setting, CTRLsum achieves state-of-the-art results on the CNN/DailyMail dataset. Code and model checkpoints are available at https://github.com/salesforce/ctrl-sum

* Preprint 
Viaarxiv icon

What's New? Summarizing Contributions in Scientific Literature

Nov 09, 2020
Hiroaki Hayashi, Wojciech Kryściński, Bryan McCann, Nazneen Rajani, Caiming Xiong

Figure 1 for What's New? Summarizing Contributions in Scientific Literature
Figure 2 for What's New? Summarizing Contributions in Scientific Literature
Figure 3 for What's New? Summarizing Contributions in Scientific Literature
Figure 4 for What's New? Summarizing Contributions in Scientific Literature

With thousands of academic articles shared on a daily basis, it has become increasingly difficult to keep up with the latest scientific findings. To overcome this problem, we introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work, making it easier to identify the key findings shared in articles. For this purpose, we extend the S2ORC corpus of academic articles, which spans a diverse set of domains ranging from economics to psychology, by adding disentangled "contribution" and "context" reference labels. Together with the dataset, we introduce and analyze three baseline approaches: 1) a unified model controlled by input code prefixes, 2) a model with separate generation heads specialized in generating the disentangled outputs, and 3) a training strategy that guides the model using additional supervision coming from inbound and outbound citations. We also propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs. Through a human study involving expert annotators, we show that in 79%, of cases our new task is considered more helpful than traditional scientific paper summarization.

* 9 pages, 5 tables, 2 figures 
Viaarxiv icon

Char2Subword: Extending the Subword Embedding Space from Pre-trained Models Using Robust Character Compositionality

Oct 24, 2020
Gustavo Aguilar, Bryan McCann, Tong Niu, Nazneen Rajani, Nitish Keskar, Thamar Solorio

Figure 1 for Char2Subword: Extending the Subword Embedding Space from Pre-trained Models Using Robust Character Compositionality
Figure 2 for Char2Subword: Extending the Subword Embedding Space from Pre-trained Models Using Robust Character Compositionality
Figure 3 for Char2Subword: Extending the Subword Embedding Space from Pre-trained Models Using Robust Character Compositionality
Figure 4 for Char2Subword: Extending the Subword Embedding Space from Pre-trained Models Using Robust Character Compositionality

Byte-pair encoding (BPE) is a ubiquitous algorithm in the subword tokenization process of language models. BPE provides multiple benefits, such as handling the out-of-vocabulary problem and reducing vocabulary sparsity. However, this process is defined from the pre-training data statistics, making the tokenization on different domains susceptible to infrequent spelling sequences (e.g., misspellings as in social media or character-level adversarial attacks). On the other hand, pure character-level models, though robust to misspellings, often lead to unreasonably large sequence lengths and make it harder for the model to learn meaningful contiguous characters. To alleviate these challenges, we propose a character-based subword transformer module (char2subword) that learns the subword embedding table in pre-trained models like BERT. Our char2subword module builds representations from characters out of the subword vocabulary, and it can be used as a drop-in replacement of the subword embedding table. The module is robust to character-level alterations such as misspellings, word inflection, casing, and punctuation. We integrate it further with BERT through pre-training while keeping BERT transformer parameters fixed. We show our method's effectiveness by outperforming a vanilla multilingual BERT on the linguistic code-switching evaluation (LinCE) benchmark.

Viaarxiv icon

GeDi: Generative Discriminator Guided Sequence Generation

Sep 14, 2020
Ben Krause, Akhilesh Deepak Gotmare, Bryan McCann, Nitish Shirish Keskar, Shafiq Joty, Richard Socher, Nazneen Fatema Rajani

Figure 1 for GeDi: Generative Discriminator Guided Sequence Generation
Figure 2 for GeDi: Generative Discriminator Guided Sequence Generation
Figure 3 for GeDi: Generative Discriminator Guided Sequence Generation
Figure 4 for GeDi: Generative Discriminator Guided Sequence Generation

Class-conditional language models (CC-LMs) can be used to generate natural language with specific attributes, such as style or sentiment, by conditioning on an attribute label, or control code. However, we find that these models struggle to control generation when applied to out-of-domain prompts or unseen control codes. To overcome these limitations, we propose generative discriminator (GeDi) guided contrastive generation, which uses CC-LMs as generative discriminators (GeDis) to efficiently guide generation from a (potentially much larger) LM towards a desired attribute. In our human evaluation experiments, we show that GeDis trained for sentiment control on movie reviews are able to control the tone of book text. We also demonstrate that GeDis are able to detoxify generation and control topic while maintaining the same level of linguistic acceptability as direct generation from GPT-2 (1.5B parameters). Lastly, we show that a GeDi trained on only 4 topics can generalize to new control codes from word embeddings, allowing it to guide generation towards wide array of topics.

Viaarxiv icon

SummEval: Re-evaluating Summarization Evaluation

Jul 31, 2020
Alexander R. Fabbri, Wojciech Kryściński, Bryan McCann, Caiming Xiong, Richard Socher, Dragomir Radev

Figure 1 for SummEval: Re-evaluating Summarization Evaluation
Figure 2 for SummEval: Re-evaluating Summarization Evaluation
Figure 3 for SummEval: Re-evaluating Summarization Evaluation
Figure 4 for SummEval: Re-evaluating Summarization Evaluation

The scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continues to inhibit progress. We address the existing shortcomings of summarization evaluation methods along five dimensions: 1) we re-evaluate 12 automatic evaluation metrics in a comprehensive and consistent fashion using neural summarization model outputs along with expert and crowd-sourced human annotations, 2) we consistently benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics, 3) we assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset and share it in a unified format, 4) we implement and share a toolkit that provides an extensible and unified API for evaluating summarization models across a broad range of automatic metrics, 5) we assemble and share the largest and most diverse, in terms of model types, collection of human judgments of model-generated summaries on the CNN/Daily Mail dataset annotated by both expert judges and crowd source workers. We hope that this work will help promote a more complete evaluation protocol for text summarization as well as advance research in developing evaluation metrics that better correlate with human judgements.

* 10 pages, 4 tables, 1 figure 
Viaarxiv icon

A Simple Language Model for Task-Oriented Dialogue

May 25, 2020
Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu, Semih Yavuz, Richard Socher

Figure 1 for A Simple Language Model for Task-Oriented Dialogue
Figure 2 for A Simple Language Model for Task-Oriented Dialogue
Figure 3 for A Simple Language Model for Task-Oriented Dialogue
Figure 4 for A Simple Language Model for Task-Oriented Dialogue

Task-oriented dialogue is often decomposed into three tasks: understanding user input, deciding actions, and generating a response. While such decomposition might suggest a dedicated model for each sub-task, we find a simple, unified approach leads to state-of-the-art performance on the MultiWOZ dataset. SimpleTOD is a simple approach to task-oriented dialogue that uses a single causal language model trained on all sub-tasks recast as a single sequence prediction problem. This allows SimpleTOD to fully leverage transfer learning from pre-trained, open domain, causal language models such as GPT-2. SimpleTOD improves over the prior state-of-the-art by 0.49 points in joint goal accuracy for dialogue state tracking. More impressively, SimpleTOD also improves the main metrics used to evaluate action decisions and response generation in an end-to-end setting for task-oriented dialog systems: inform rate by 8.1 points, success rate by 9.7 points, and combined score by 7.2 points.

* Version 2: Adding error analysis; 20 Pages, 1 figure, 18 tables 
Viaarxiv icon

Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation

May 03, 2020
Tianlu Wang, Xi Victoria Lin, Nazneen Fatema Rajani, Bryan McCann, Vicente Ordonez, Caiming Xiong

Figure 1 for Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation
Figure 2 for Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation
Figure 3 for Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation
Figure 4 for Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation

Word embeddings derived from human-generated corpora inherit strong gender bias which can be further amplified by downstream models. Some commonly adopted debiasing approaches, including the seminal Hard Debias algorithm, apply post-processing procedures that project pre-trained word embeddings into a subspace orthogonal to an inferred gender subspace. We discover that semantic-agnostic corpus regularities such as word frequency captured by the word embeddings negatively impact the performance of these algorithms. We propose a simple but effective technique, Double Hard Debias, which purifies the word embeddings against such corpus regularities prior to inferring and removing the gender subspace. Experiments on three bias mitigation benchmarks show that our approach preserves the distributional semantics of the pre-trained word embeddings while reducing gender bias to a significantly larger degree than prior approaches.

* Accepted to ACL 2020 
Viaarxiv icon