Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yi Mao

A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation

Apr 18, 2021

Tianyu Liu, Yizhe Zhang, Chris Brockett, Yi Mao, Zhifang Sui, Weizhu Chen, Bill Dolan

Figure 1 for A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation

Figure 2 for A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation

Figure 3 for A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation

Figure 4 for A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation

Abstract:Large pretrained generative models like GPT-3 often suffer from hallucinating non-existent or incorrect content, which undermines their potential merits in real applications. Existing work usually attempts to detect these hallucinations based on a corresponding oracle reference at a sentence or document level. However ground-truth references may not be readily available for many free-form text generation applications, and sentence- or document-level detection may fail to provide the fine-grained signals that would prevent fallacious content in real time. As a first step to addressing these issues, we propose a novel token-level, reference-free hallucination detection task and an associated annotated dataset named HaDes (HAllucination DEtection dataSet). To create this dataset, we first perturb a large number of text segments extracted from English language Wikipedia, and then verify these with crowd-sourced annotations. To mitigate label imbalance during annotation, we utilize an iterative model-in-loop strategy. We conduct comprehensive data analyses and create multiple baseline models.

Via

Access Paper or Ask Questions

Finetuning Pretrained Transformers into RNNs

Mar 24, 2021

Jungo Kasai, Hao Peng, Yizhe Zhang, Dani Yogatama, Gabriel Ilharco, Nikolaos Pappas, Yi Mao, Weizhu Chen, Noah A. Smith

Figure 1 for Finetuning Pretrained Transformers into RNNs

Figure 2 for Finetuning Pretrained Transformers into RNNs

Figure 3 for Finetuning Pretrained Transformers into RNNs

Figure 4 for Finetuning Pretrained Transformers into RNNs

Abstract:Transformers have outperformed recurrent neural networks (RNNs) in natural language generation. This comes with a significant computational overhead, as the attention mechanism scales with a quadratic complexity in sequence length. Efficient transformer variants have received increasing interest from recent works. Among them, a linear-complexity recurrent variant has proven well suited for autoregressive generation. It approximates the softmax attention with randomized or heuristic feature maps, but can be difficult to train or yield suboptimal accuracy. This work aims to convert a pretrained transformer into its efficient recurrent counterpart, improving the efficiency while retaining the accuracy. Specifically, we propose a swap-then-finetune procedure: in an off-the-shelf pretrained transformer, we replace the softmax attention with its linear-complexity recurrent alternative and then finetune. With a learned feature map, our approach provides an improved tradeoff between efficiency and accuracy over the standard transformer and other recurrent variants. We also show that the finetuning process needs lower training cost than training these recurrent variants from scratch. As many recent models for natural language tasks are increasingly dependent on large-scale pretrained transformers, this work presents a viable approach to improving inference efficiency without repeating the expensive pretraining process.

Via

Access Paper or Ask Questions

Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning

Apr 29, 2020

Tao Shen, Yi Mao, Pengcheng He, Guodong Long, Adam Trischler, Weizhu Chen

Figure 1 for Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning

Figure 2 for Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning

Figure 3 for Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning

Figure 4 for Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning

Abstract:In this work, we aim at equipping pre-trained language models with structured knowledge. We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs. Building upon entity-level masked language models, our first contribution is an entity masking scheme that exploits relational knowledge underlying the text. This is fulfilled by using a linked knowledge graph to select informative entities and then masking their mentions. In addition we use knowledge graphs to obtain distractors for the masked entities, and propose a novel distractor-suppressed ranking objective which is optimized jointly with masked language model. In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training, to inject language models with structured knowledge via learning from raw text. It is more efficient than retrieval-based methods that perform entity linking and integration during finetuning and inference, and generalizes more effectively than the methods that directly learn from concatenated graph triples. Experiments show that our proposed model achieves improved performance on five benchmark datasets, including question answering and knowledge base completion tasks.

Via

Access Paper or Ask Questions

Conditional Self-Attention for Query-based Summarization

Feb 18, 2020

Yujia Xie, Tianyi Zhou, Yi Mao, Weizhu Chen

Figure 1 for Conditional Self-Attention for Query-based Summarization

Figure 2 for Conditional Self-Attention for Query-based Summarization

Figure 3 for Conditional Self-Attention for Query-based Summarization

Figure 4 for Conditional Self-Attention for Query-based Summarization

Abstract:Self-attention mechanisms have achieved great success on a variety of NLP tasks due to its flexibility of capturing dependency between arbitrary positions in a sequence. For problems such as query-based summarization (Qsumm) and knowledge graph reasoning where each input sequence is associated with an extra query, explicitly modeling such conditional contextual dependencies can lead to a more accurate solution, which however cannot be captured by existing self-attention mechanisms. In this paper, we propose \textit{conditional self-attention} (CSA), a neural network module designed for conditional dependency modeling. CSA works by adjusting the pairwise attention between input tokens in a self-attention module with the matching score of the inputs to the given query. Thereby, the contextual dependencies modeled by CSA will be highly relevant to the query. We further studied variants of CSA defined by different types of attention. Experiments on Debatepedia and HotpotQA benchmark datasets show CSA consistently outperforms vanilla Transformer and previous models for the Qsumm problem.

Via

Access Paper or Ask Questions

X-SQL: reinforce schema representation with context

Aug 21, 2019

Pengcheng He, Yi Mao, Kaushik Chakrabarti, Weizhu Chen

Figure 1 for X-SQL: reinforce schema representation with context

Figure 2 for X-SQL: reinforce schema representation with context

Figure 3 for X-SQL: reinforce schema representation with context

Abstract:In this work, we present X-SQL, a new network architecture for the problem of parsing natural language to SQL query. X-SQL proposes to enhance the structural schema representation with the contextual output from BERT-style pre-training model, and together with type information to learn a new schema representation for down-stream tasks. We evaluated X-SQL on the WikiSQL dataset and show its new state-of-the-art performance.

Via

Access Paper or Ask Questions

IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles

Oct 01, 2018

Tianze Shi, Kedar Tatwawadi, Kaushik Chakrabarti, Yi Mao, Oleksandr Polozov, Weizhu Chen

Figure 1 for IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles

Figure 2 for IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles

Figure 3 for IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles

Figure 4 for IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles

Abstract:We present a sequence-to-action parsing approach for the natural language to SQL task that incrementally fills the slots of a SQL query with feasible actions from a pre-defined inventory. To account for the fact that typically there are multiple correct SQL queries with the same or very similar semantics, we draw inspiration from syntactic parsing techniques and propose to train our sequence-to-action models with non-deterministic oracles. We evaluate our models on the WikiSQL dataset and achieve an execution accuracy of 83.7% on the test set, a 2.1% absolute improvement over the models trained with traditional static oracles assuming a single correct target SQL query. When further combined with the execution-guided decoding strategy, our model sets a new state-of-the-art performance at an execution accuracy of 87.1%.

Via

Access Paper or Ask Questions

Robust Text-to-SQL Generation with Execution-Guided Decoding

Sep 13, 2018

Chenglong Wang, Kedar Tatwawadi, Marc Brockschmidt, Po-Sen Huang, Yi Mao, Oleksandr Polozov, Rishabh Singh

Figure 1 for Robust Text-to-SQL Generation with Execution-Guided Decoding

Figure 2 for Robust Text-to-SQL Generation with Execution-Guided Decoding

Figure 3 for Robust Text-to-SQL Generation with Execution-Guided Decoding

Figure 4 for Robust Text-to-SQL Generation with Execution-Guided Decoding

Abstract:We consider the problem of neural semantic parsing, which translates natural language questions into executable SQL queries. We introduce a new mechanism, execution guidance, to leverage the semantics of SQL. It detects and excludes faulty programs during the decoding procedure by conditioning on the execution of partially generated program. The mechanism can be used with any autoregressive generative model, which we demonstrate on four state-of-the-art recurrent or template-based semantic parsing models. We demonstrate that execution guidance universally improves model performance on various text-to-SQL datasets with different scales and query complexity: WikiSQL, ATIS, and GeoQuery. As a result, we achieve new state-of-the-art execution accuracy of 83.8% on WikiSQL.

Via

Access Paper or Ask Questions

Action-depedent Control Variates for Policy Optimization via Stein's Identity

Feb 23, 2018

Hao Liu, Yihao Feng, Yi Mao, Dengyong Zhou, Jian Peng, Qiang Liu

Figure 1 for Action-depedent Control Variates for Policy Optimization via Stein's Identity

Figure 2 for Action-depedent Control Variates for Policy Optimization via Stein's Identity

Figure 3 for Action-depedent Control Variates for Policy Optimization via Stein's Identity

Figure 4 for Action-depedent Control Variates for Policy Optimization via Stein's Identity

Abstract:Policy gradient methods have achieved remarkable successes in solving challenging reinforcement learning problems. However, it still often suffers from the large variance issue on policy gradient estimation, which leads to poor sample efficiency during training. In this work, we propose a control variate method to effectively reduce variance for policy gradient methods. Motivated by the Stein's identity, our method extends the previous control variate methods used in REINFORCE and advantage actor-critic by introducing more general action-dependent baseline functions. Empirical studies show that our method significantly improves the sample efficiency of the state-of-the-art policy gradient approaches.

* The first two authors contributed equally. Author ordering determined by coin flip over a Google Hangout. Accepted by ICLR 2018

Via

Access Paper or Ask Questions

Statistical Translation, Heat Kernels and Expected Distances

Jun 20, 2012

Joshua Dillon, Yi Mao, Guy Lebanon, Jian Zhang

Figure 1 for Statistical Translation, Heat Kernels and Expected Distances

Figure 2 for Statistical Translation, Heat Kernels and Expected Distances

Figure 3 for Statistical Translation, Heat Kernels and Expected Distances

Figure 4 for Statistical Translation, Heat Kernels and Expected Distances

Abstract:High dimensional structured data such as text and images is often poorly understood and misrepresented in statistical modeling. The standard histogram representation suffers from high variance and performs poorly in general. We explore novel connections between statistical translation, heat kernels on manifolds and graphs, and expected distances. These connections provide a new framework for unsupervised metric learning for text documents. Experiments indicate that the resulting distances are generally superior to their more standard counterparts.

* Appears in Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007)

Via

Access Paper or Ask Questions

Domain Knowledge Uncertainty and Probabilistic Parameter Constraints

May 09, 2012

Yi Mao, Guy Lebanon

Figure 1 for Domain Knowledge Uncertainty and Probabilistic Parameter Constraints

Figure 2 for Domain Knowledge Uncertainty and Probabilistic Parameter Constraints

Figure 3 for Domain Knowledge Uncertainty and Probabilistic Parameter Constraints

Figure 4 for Domain Knowledge Uncertainty and Probabilistic Parameter Constraints

Abstract:Incorporating domain knowledge into the modeling process is an effective way to improve learning accuracy. However, as it is provided by humans, domain knowledge can only be specified with some degree of uncertainty. We propose to explicitly model such uncertainty through probabilistic constraints over the parameter space. In contrast to hard parameter constraints, our approach is effective also when the domain knowledge is inaccurate and generally results in superior modeling accuracy. We focus on generative and conditional modeling where the parameters are assigned a Dirichlet or Gaussian prior and demonstrate the framework with experiments on both synthetic and real-world data.

* Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)

Via

Access Paper or Ask Questions