Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Graham Neubig

Carnegie Mellon University

OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering

Jul 08, 2022

Zhengbao Jiang, Yi Mao, Pengcheng He, Graham Neubig, Weizhu Chen

Figure 1 for OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering

Figure 2 for OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering

Figure 3 for OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering

Figure 4 for OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering

Abstract:The information in tables can be an important complement to text, making table-based question answering (QA) systems of great value. The intrinsic complexity of handling tables often adds an extra burden to both model design and data annotation. In this paper, we aim to develop a simple table-based QA model with minimal annotation effort. Motivated by the fact that table-based QA requires both alignment between questions and tables and the ability to perform complicated reasoning over multiple table elements, we propose an omnivorous pretraining approach that consumes both natural and synthetic data to endow models with these respective abilities. Specifically, given freely available tables, we leverage retrieval to pair them with relevant natural sentences for mask-based pretraining, and synthesize NL questions by converting SQL sampled from tables for pretraining with a QA loss. We perform extensive experiments in both few-shot and full settings, and the results clearly demonstrate the superiority of our model OmniTab, with the best multitasking approach achieving an absolute gain of 16.2% and 2.7% in 128-shot and full settings respectively, also establishing a new state-of-the-art on WikiTableQuestions. Detailed ablations and analyses reveal different characteristics of natural and synthetic data, shedding light on future directions in omnivorous pretraining. Code, pretraining data, and pretrained models are available at https://github.com/jzbjyb/OmniTab.

* NAACL 2022

Via

Access Paper or Ask Questions

Building African Voices

Jul 01, 2022

Perez Ogayo, Graham Neubig, Alan W Black

Abstract:Modern speech synthesis techniques can produce natural-sounding speech given sufficient high-quality data and compute resources. However, such data is not readily available for many languages. This paper focuses on speech synthesis for low-resourced African languages, from corpus creation to sharing and deploying the Text-to-Speech (TTS) systems. We first create a set of general-purpose instructions on building speech synthesis systems with minimum technological resources and subject-matter expertise. Next, we create new datasets and curate datasets from "found" data (existing recordings) through a participatory approach while considering accessibility, quality, and breadth. We demonstrate that we can develop synthesizers that generate intelligible speech with 25 minutes of created speech, even when recorded in suboptimal environments. Finally, we release the speech data, code, and trained voices for 12 African languages to support researchers and developers.

Via

Access Paper or Ask Questions

AANG: Automating Auxiliary Learning

May 27, 2022

Lucio M. Dery, Paul Michel, Mikhail Khodak, Graham Neubig, Ameet Talwalkar

Figure 1 for AANG: Automating Auxiliary Learning

Figure 2 for AANG: Automating Auxiliary Learning

Figure 3 for AANG: Automating Auxiliary Learning

Figure 4 for AANG: Automating Auxiliary Learning

Abstract:When faced with data-starved or highly complex end-tasks, it is commonplace for machine learning practitioners to introduce auxiliary objectives as supplementary learning signals. Whilst much work has been done to formulate useful auxiliary objectives, their construction is still an art which proceeds by slow and tedious hand-design. Intuitions about how and when these objectives improve end-task performance have also had limited theoretical backing. In this work, we present an approach for automatically generating a suite of auxiliary objectives. We achieve this by deconstructing existing objectives within a novel unified taxonomy, identifying connections between them, and generating new ones based on the uncovered structure. Next, we theoretically formalize widely-held intuitions about how auxiliary learning improves generalization of the end-task. This leads us to a principled and efficient algorithm for searching the space of generated objectives to find those most useful to a specified end-task. With natural language processing (NLP) as our domain of study, we empirically verify that our automated auxiliary learning pipeline leads to strong improvements over competitive baselines across continued training experiments on a pre-trained model on 5 NLP end-tasks.

* 18 pages, 9 tables and 4 figures

Via

Access Paper or Ask Questions

Learning to Model Editing Processes

May 24, 2022

Machel Reid, Graham Neubig

Figure 1 for Learning to Model Editing Processes

Figure 2 for Learning to Model Editing Processes

Figure 3 for Learning to Model Editing Processes

Figure 4 for Learning to Model Editing Processes

Abstract:Most existing sequence generation models produce outputs in one pass, usually left-to-right. However, this is in contrast with a more natural approach that humans use in generating content; iterative refinement and editing. Recent work has introduced edit-based models for various tasks (such as neural machine translation and text style transfer), but these generally model a single edit step. In this work, we propose modeling editing processes, modeling the whole process of iteratively generating sequences. We form a conceptual framework to describe the likelihood of multi-step edits, and describe neural models that can learn a generative model of sequences based on these multistep edits. We introduce baseline results and metrics on this task, finding that modeling editing processes improves performance on a variety of axes on both our proposed task and related downstream tasks compared to previous single-step models of edits.

Via

Access Paper or Ask Questions

Table Retrieval May Not Necessitate Table-specific Model Design

May 19, 2022

Zhiruo Wang, Zhengbao Jiang, Eric Nyberg, Graham Neubig

Figure 1 for Table Retrieval May Not Necessitate Table-specific Model Design

Figure 2 for Table Retrieval May Not Necessitate Table-specific Model Design

Figure 3 for Table Retrieval May Not Necessitate Table-specific Model Design

Figure 4 for Table Retrieval May Not Necessitate Table-specific Model Design

Abstract:Tables are an important form of structured data for both human and machine readers alike, providing answers to questions that cannot, or cannot easily, be found in texts. Recent work has designed special models and training paradigms for table-related tasks such as table-based question answering and table retrieval. Though effective, they add complexity in both modeling and data acquisition compared to generic text solutions and obscure which elements are truly beneficial. In this work, we focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval, or can a simpler text-based model be effectively used to achieve a similar result?" First, we perform an analysis on a table-based portion of the Natural Questions dataset (NQ-table), and find that structure plays a negligible role in more than 70% of the cases. Based on this, we experiment with a general Dense Passage Retriever (DPR) based on text and a specialized Dense Table Retriever (DTR) that uses table-specific model designs. We find that DPR performs well without any table-specific design and training, and even achieves superior results compared to DTR when fine-tuned on properly linearized tables. We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases. However, none of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.

* 11 pages total, 4 figures

Via

Access Paper or Ask Questions

Quality-Aware Decoding for Neural Machine Translation

May 02, 2022

Patrick Fernandes, António Farinhas, Ricardo Rei, José G. C. de Souza, Perez Ogayo, Graham Neubig, André F. T. Martins

Figure 1 for Quality-Aware Decoding for Neural Machine Translation

Figure 2 for Quality-Aware Decoding for Neural Machine Translation

Figure 3 for Quality-Aware Decoding for Neural Machine Translation

Figure 4 for Quality-Aware Decoding for Neural Machine Translation

Abstract:Despite the progress in machine translation quality estimation and evaluation in the last years, decoding in neural machine translation (NMT) is mostly oblivious to this and centers around finding the most probable translation according to the model (MAP decoding), approximated with beam search. In this paper, we bring together these two lines of research and propose quality-aware decoding for NMT, by leveraging recent breakthroughs in reference-free and reference-based MT evaluation through various inference methods like $N$-best reranking and minimum Bayes risk decoding. We perform an extensive comparison of various possible candidate generation and ranking methods across four datasets and two model classes and find that quality-aware decoding consistently outperforms MAP-based decoding according both to state-of-the-art automatic metrics (COMET and BLEURT) and to human assessments. Our code is available at https://github.com/deep-spin/qaware-decode.

* NAACL2022

Via

Access Paper or Ask Questions

Prompt Consistency for Zero-Shot Task Generalization

Apr 29, 2022

Chunting Zhou, Junxian He, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham Neubig

Figure 1 for Prompt Consistency for Zero-Shot Task Generalization

Figure 2 for Prompt Consistency for Zero-Shot Task Generalization

Figure 3 for Prompt Consistency for Zero-Shot Task Generalization

Figure 4 for Prompt Consistency for Zero-Shot Task Generalization

Abstract:One of the most impressive results of recent NLP history is the ability of pre-trained language models to solve new tasks in a zero-shot setting. To achieve this, NLP tasks are framed as natural language prompts, generating a response indicating the predicted output. Nonetheless, the performance in such settings often lags far behind its supervised counterpart, suggesting a large space for potential improvement. In this paper, we explore methods to utilize unlabeled data to improve zero-shot performance. Specifically, we take advantage of the fact that multiple prompts can be used to specify a single task, and propose to regularize prompt consistency, encouraging consistent predictions over this diverse set of prompts. Our method makes it possible to fine-tune the model either with extra unlabeled training data, or directly on test input at inference time in an unsupervised manner. In experiments, our approach outperforms the state-of-the-art zero-shot learner, T0 (Sanh et al., 2022), on 9 out of 11 datasets across 4 NLP tasks by up to 10.6 absolute points in terms of accuracy. The gains are often attained with a small number of unlabeled examples.

* Preprint. Code is available at https://github.com/violet-zct/swarm-distillation-zero-shot

Via

Access Paper or Ask Questions

Testing the Ability of Language Models to Interpret Figurative Language

Apr 26, 2022

Emmy Liu, Chen Cui, Kenneth Zheng, Graham Neubig

Figure 1 for Testing the Ability of Language Models to Interpret Figurative Language

Figure 2 for Testing the Ability of Language Models to Interpret Figurative Language

Figure 3 for Testing the Ability of Language Models to Interpret Figurative Language

Figure 4 for Testing the Ability of Language Models to Interpret Figurative Language

Abstract:Figurative and metaphorical language are commonplace in discourse, and figurative expressions play an important role in communication and cognition. However, figurative language has been a relatively under-studied area in NLP, and it remains an open question to what extent modern language models can interpret nonliteral phrases. To address this question, we introduce Fig-QA, a Winograd-style nonliteral language understanding task consisting of correctly interpreting paired figurative phrases with divergent meanings. We evaluate the performance of several state-of-the-art language models on this task, and find that although language models achieve performance significantly over chance, they still fall short of human performance, particularly in zero- or few-shot settings. This suggests that further work is needed to improve the nonliteral reasoning capabilities of language models.

* NAACL 2022

Via

Access Paper or Ask Questions

Learning to Scaffold: Optimizing Model Explanations for Teaching

Apr 22, 2022

Patrick Fernandes, Marcos Treviso, Danish Pruthi, André F. T. Martins, Graham Neubig

Figure 1 for Learning to Scaffold: Optimizing Model Explanations for Teaching

Figure 2 for Learning to Scaffold: Optimizing Model Explanations for Teaching

Figure 3 for Learning to Scaffold: Optimizing Model Explanations for Teaching

Figure 4 for Learning to Scaffold: Optimizing Model Explanations for Teaching

Abstract:Modern machine learning models are opaque, and as a result there is a burgeoning academic subfield on methods that explain these models' behavior. However, what is the precise goal of providing such explanations, and how can we demonstrate that explanations achieve this goal? Some research argues that explanations should help teach a student (either human or machine) to simulate the model being explained, and that the quality of explanations can be measured by the simulation accuracy of students on unexplained examples. In this work, leveraging meta-learning techniques, we extend this idea to improve the quality of the explanations themselves, specifically by optimizing explanations such that student models more effectively learn to simulate the original model. We train models on three natural language processing and computer vision tasks, and find that students trained with explanations extracted with our framework are able to simulate the teacher significantly more effectively than ones produced with previous methods. Through human annotations and a user study, we further find that these learned explanations more closely align with how humans would explain the required decisions in these tasks. Our code is available at https://github.com/coderpat/learning-scaffold

* 9 pages

Via

Access Paper or Ask Questions

Distributionally Robust Models with Parametric Likelihood Ratios

Apr 13, 2022

Paul Michel, Tatsunori Hashimoto, Graham Neubig

Figure 1 for Distributionally Robust Models with Parametric Likelihood Ratios

Figure 2 for Distributionally Robust Models with Parametric Likelihood Ratios

Figure 3 for Distributionally Robust Models with Parametric Likelihood Ratios

Figure 4 for Distributionally Robust Models with Parametric Likelihood Ratios

Abstract:As machine learning models are deployed ever more broadly, it becomes increasingly important that they are not only able to perform well on their training distribution, but also yield accurate predictions when confronted with distribution shift. The Distributionally Robust Optimization (DRO) framework proposes to address this issue by training models to minimize their expected risk under a collection of distributions, to imitate test-time shifts. This is most commonly achieved by instance-level re-weighting of the training objective to emulate the likelihood ratio with possible test distributions, which allows for estimating their empirical risk via importance sampling (assuming that they are subpopulations of the training distribution). However, re-weighting schemes in the literature are usually limited due to the difficulty of keeping the optimization problem tractable and the complexity of enforcing normalization constraints. In this paper, we show that three simple ideas -- mini-batch level normalization, a KL penalty and simultaneous gradient updates -- allow us to train models with DRO using a broader class of parametric likelihood ratios. In a series of experiments on both image and text classification benchmarks, we find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches, and that the method performs reliably well with little hyper-parameter tuning. Code to reproduce our experiments can be found at https://github.com/pmichel31415/P-DRO.

* ICLR 2022

Via

Access Paper or Ask Questions