Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jianfeng Gao

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

Mar 08, 2023

Baolin Peng, Michel Galley, Pengcheng He, Hao Cheng, Yujia Xie, Yu Hu, Qiuyuan Huang, Lars Liden, Zhou Yu, Weizhu Chen(+1 more)

Figure 1 for Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

Figure 2 for Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

Figure 3 for Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

Figure 4 for Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

Abstract:Large language models (LLMs), such as ChatGPT, are able to generate human-like, fluent responses for many downstream tasks, e.g., task-oriented dialog and question answering. However, applying LLMs to real-world, mission-critical applications remains challenging mainly due to their tendency to generate hallucinations and their inability to use external knowledge. This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules. Our system makes the LLM generate responses grounded in external knowledge, e.g., stored in task-specific databases. It also iteratively revises LLM prompts to improve model responses using feedback generated by utility functions, e.g., the factuality score of a LLM-generated response. The effectiveness of LLM-Augmenter is empirically validated on two types of scenarios, task-oriented dialog and open-domain question answering. LLM-Augmenter significantly reduces ChatGPT's hallucinations without sacrificing the fluency and informativeness of its responses. We make the source code and models publicly available.

* 15 pages

Via

Access Paper or Ask Questions

Guiding Large Language Models via Directional Stimulus Prompting

Feb 22, 2023

Zekun Li, Baolin Peng, Pengcheng He, Michel Galley, Jianfeng Gao, Xifeng Yan

Abstract:We introduce a new framework, Directional Stimulus Prompting, that uses a tuneable language model (LM) to provide guidance for the black-box frozen large language model (LLM) on downstream tasks. Unlike prior work that manually or automatically finds the optimal prompt for each task, we train a policy LM to generate discrete tokens as ``directional stimulus'' of each input, which is a hint/cue such as keywords of an article for summarization. The directional stimulus is then combined with the original input and fed into the LLM to guide its generation toward the desired target. The policy LM can be trained through 1) supervised learning from annotated data and 2) reinforcement learning from offline and online rewards to explore directional stimulus that better aligns LLMs with human preferences. This framework is flexibly applicable to various LMs and tasks. To verify its effectiveness, we apply our framework to summarization and dialogue response generation tasks. Experimental results demonstrate that it can significantly improve LLMs' performance with a small collection of training data: a T5 (780M) trained with 2,000 samples from the CNN/Daily Mail dataset improves Codex (175B)'s performance by 7.2% in ROUGE-Avg scores; 500 dialogues boost the combined score by 52.5%, achieving comparable or even better performance than fully trained models on the MultiWOZ dataset.

Via

Access Paper or Ask Questions

Learning Customized Visual Models with Retrieval-Augmented Knowledge

Jan 17, 2023

Haotian Liu, Kilho Son, Jianwei Yang, Ce Liu, Jianfeng Gao, Yong Jae Lee, Chunyuan Li

Figure 1 for Learning Customized Visual Models with Retrieval-Augmented Knowledge

Figure 2 for Learning Customized Visual Models with Retrieval-Augmented Knowledge

Figure 3 for Learning Customized Visual Models with Retrieval-Augmented Knowledge

Figure 4 for Learning Customized Visual Models with Retrieval-Augmented Knowledge

Abstract:Image-text contrastive learning models such as CLIP have demonstrated strong task transfer ability. The high generality and usability of these visual models is achieved via a web-scale data collection process to ensure broad concept coverage, followed by expensive pre-training to feed all the knowledge into model weights. Alternatively, we propose REACT, REtrieval-Augmented CusTomization, a framework to acquire the relevant web knowledge to build customized visual models for target domains. We retrieve the most relevant image-text pairs (~3% of CLIP pre-training data) from the web-scale database as external knowledge, and propose to customize the model by only training new modualized blocks while freezing all the original weights. The effectiveness of REACT is demonstrated via extensive experiments on classification, retrieval, detection and segmentation tasks, including zero, few, and full-shot settings. Particularly, on the zero-shot classification task, compared with CLIP, it achieves up to 5.4% improvement on ImageNet and 3.7% on the ELEVATER benchmark (20 datasets).

Via

Access Paper or Ask Questions

GLIGEN: Open-Set Grounded Text-to-Image Generation

Jan 17, 2023

Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, Yong Jae Lee

Figure 1 for GLIGEN: Open-Set Grounded Text-to-Image Generation

Figure 2 for GLIGEN: Open-Set Grounded Text-to-Image Generation

Figure 3 for GLIGEN: Open-Set Grounded Text-to-Image Generation

Figure 4 for GLIGEN: Open-Set Grounded Text-to-Image Generation

Abstract:Large-scale text-to-image diffusion models have made amazing advances. However, the status quo is to use text input alone, which can impede controllability. In this work, we propose GLIGEN, Grounded-Language-to-Image Generation, a novel approach that builds upon and extends the functionality of existing pre-trained text-to-image diffusion models by enabling them to also be conditioned on grounding inputs. To preserve the vast concept knowledge of the pre-trained model, we freeze all of its weights and inject the grounding information into new trainable layers via a gated mechanism. Our model achieves open-world grounded text2img generation with caption and bounding box condition inputs, and the grounding ability generalizes well to novel spatial configuration and concepts. GLIGEN's zero-shot performance on COCO and LVIS outperforms that of existing supervised layout-to-image baselines by a large margin.

Via

Access Paper or Ask Questions

Generalized Decoding for Pixel, Image, and Language

Dec 21, 2022

Xueyan Zou, Zi-Yi Dou, Jianwei Yang, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, Jianfeng Wang, Lu Yuan(+4 more)

Figure 1 for Generalized Decoding for Pixel, Image, and Language

Figure 2 for Generalized Decoding for Pixel, Image, and Language

Figure 3 for Generalized Decoding for Pixel, Image, and Language

Figure 4 for Generalized Decoding for Pixel, Image, and Language

Abstract:We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly. X-Decodert takes as input two types of queries: (i) generic non-semantic queries and (ii) semantic queries induced from text inputs, to decode different pixel-level and token-level outputs in the same semantic space. With such a novel design, X-Decoder is the first work that provides a unified way to support all types of image segmentation and a variety of vision-language (VL) tasks. Further, our design enables seamless interactions across tasks at different granularities and brings mutual benefits by learning a common and rich pixel-level visual-semantic understanding space, without any pseudo-labeling. After pretraining on a mixed set of a limited amount of segmentation data and millions of image-text pairs, X-Decoder exhibits strong transferability to a wide range of downstream tasks in both zero-shot and finetuning settings. Notably, it achieves (1) state-of-the-art results on open-vocabulary segmentation and referring segmentation on eight datasets; (2) better or competitive finetuned performance to other generalist and specialist models on segmentation and VL tasks; and (3) flexibility for efficient finetuning and novel task composition (e.g., referring captioning and image editing). Code, demo, video, and visualization are available at https://x-decoder-vl.github.io.

* https://x-decoder-vl.github.io

Via

Access Paper or Ask Questions

Language Models as Inductive Reasoners

Dec 21, 2022

Zonglin Yang, Li Dong, Xinya Du, Hao Cheng, Erik Cambria, Xiaodong Liu, Jianfeng Gao, Furu Wei

Abstract:Inductive reasoning is a core component of human intelligence. In the past research of inductive reasoning within computer science, logic language is used as representations of knowledge (facts and rules, more specifically). However, logic language can cause systematic problems for inductive reasoning such as disability of handling raw input such as natural language, sensitiveness to mislabeled data, and incapacity to handle ambiguous input. To this end, we propose a new task, which is to induce natural language rules from natural language facts, and create a dataset termed DEER containing 1.2k rule-fact pairs for the task, where rules and facts are written in natural language. New automatic metrics are also proposed and analysed for the evaluation of this task. With DEER, we investigate a modern approach for inductive reasoning where we use natural language as representation for knowledge instead of logic language and use pretrained language models as ''reasoners''. Moreover, we provide the first and comprehensive analysis of how well pretrained language models can induce natural language rules from natural language facts. We also propose a new framework drawing insights from philosophy literature for this task, which we show in the experiment section that surpasses baselines in both automatic and human evaluations.

Via

Access Paper or Ask Questions

DIONYSUS: A Pre-trained Model for Low-Resource Dialogue Summarization

Dec 20, 2022

Yu Li, Baolin Peng, Pengcheng He, Michel Galley, Zhou Yu, Jianfeng Gao

Figure 1 for DIONYSUS: A Pre-trained Model for Low-Resource Dialogue Summarization

Figure 2 for DIONYSUS: A Pre-trained Model for Low-Resource Dialogue Summarization

Figure 3 for DIONYSUS: A Pre-trained Model for Low-Resource Dialogue Summarization

Figure 4 for DIONYSUS: A Pre-trained Model for Low-Resource Dialogue Summarization

Abstract:Dialogue summarization has recently garnered significant attention due to its wide range of applications. However, existing methods for summarizing dialogues are suboptimal because they do not take into account the inherent structure of dialogue and rely heavily on labeled data, which can lead to poor performance in new domains. In this work, we propose DIONYSUS (dynamic input optimization in pre-training for dialogue summarization), a pre-trained encoder-decoder model for summarizing dialogues in any new domain. To pre-train DIONYSUS, we create two pseudo summaries for each dialogue example: one is produced by a fine-tuned summarization model, and the other is a collection of dialogue turns that convey important information. We then choose one of these pseudo summaries based on the difference in information distribution across different types of dialogues. This selected pseudo summary serves as the objective for pre-training DIONYSUS using a self-supervised approach on a large dialogue corpus. Our experiments show that DIONYSUS outperforms existing methods on six datasets, as demonstrated by its ROUGE scores in zero-shot and few-shot settings.

Via

Access Paper or Ask Questions

Enhancing Task Bot Engagement with Synthesized Open-Domain Dialog

Dec 20, 2022

Miaoran Li, Baolin Peng, Michel Galley, Jianfeng Gao, Zhu Zhang

Figure 1 for Enhancing Task Bot Engagement with Synthesized Open-Domain Dialog

Figure 2 for Enhancing Task Bot Engagement with Synthesized Open-Domain Dialog

Figure 3 for Enhancing Task Bot Engagement with Synthesized Open-Domain Dialog

Figure 4 for Enhancing Task Bot Engagement with Synthesized Open-Domain Dialog

Abstract:Many efforts have been made to construct dialog systems for different types of conversations, such as task-oriented dialog (TOD) and open-domain dialog (ODD). To better mimic human-level conversations that usually fuse various dialog modes, it is essential to build a system that can effectively handle both TOD and ODD and access different knowledge sources. To address the lack of available data for the fused task, we propose a framework for automatically generating dialogues that combine knowledge-grounded ODDs and TODs in various settings. Additionally, we introduce a unified model PivotBot that is capable of appropriately adopting TOD and ODD modes and accessing different knowledge sources in order to effectively tackle the fused task. Evaluation results demonstrate the superior ability of the proposed model to switch seamlessly between TOD and ODD tasks.

Via

Access Paper or Ask Questions

Efficient Long Sequence Modeling via State Space Augmented Transformer

Dec 15, 2022

Simiao Zuo, Xiaodong Liu, Jian Jiao, Denis Charles, Eren Manavoglu, Tuo Zhao, Jianfeng Gao

Figure 1 for Efficient Long Sequence Modeling via State Space Augmented Transformer

Figure 2 for Efficient Long Sequence Modeling via State Space Augmented Transformer

Figure 3 for Efficient Long Sequence Modeling via State Space Augmented Transformer

Figure 4 for Efficient Long Sequence Modeling via State Space Augmented Transformer

Abstract:Transformer models have achieved superior performance in various natural language processing tasks. However, the quadratic computational cost of the attention mechanism limits its practicality for long sequences. There are existing attention variants that improve the computational efficiency, but they have limited ability to effectively compute global information. In parallel to Transformer models, state space models (SSMs) are tailored for long sequences, but they are not flexible enough to capture complicated local information. We propose SPADE, short for $\underline{\textbf{S}}$tate s$\underline{\textbf{P}}$ace $\underline{\textbf{A}}$ugmente$\underline{\textbf{D}}$ Transform$\underline{\textbf{E}}$r. Specifically, we augment a SSM into the bottom layer of SPADE, and we employ efficient local attention methods for the other layers. The SSM augments global information, which complements the lack of long-range dependency issue in local attention methods. Experimental results on the Long Range Arena benchmark and language modeling tasks demonstrate the effectiveness of the proposed method. To further demonstrate the scalability of SPADE, we pre-train large encoder-decoder models and present fine-tuning results on natural language understanding and natural language generation tasks.

Via

Access Paper or Ask Questions

Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation

Dec 04, 2022

Faeze Brahman, Baolin Peng, Michel Galley, Sudha Rao, Bill Dolan, Snigdha Chaturvedi, Jianfeng Gao

Figure 1 for Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation

Figure 2 for Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation

Figure 3 for Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation

Figure 4 for Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation

Abstract:Large pre-trained language models have recently enabled open-ended generation frameworks (e.g., prompt-to-text NLG) to tackle a variety of tasks going beyond the traditional data-to-text generation. While this framework is more general, it is under-specified and often leads to a lack of controllability restricting their real-world usage. We propose a new grounded keys-to-text generation task: the task is to generate a factual description about an entity given a set of guiding keys, and grounding passages. To address this task, we introduce a new dataset, called EntDeGen. Inspired by recent QA-based evaluation measures, we propose an automatic metric, MAFE, for factual correctness of generated descriptions. Our EntDescriptor model is equipped with strong rankers to fetch helpful passages and generate entity descriptions. Experimental result shows a good correlation (60.14) between our proposed metric and human judgments of factuality. Our rankers significantly improved the factual correctness of generated descriptions (15.95% and 34.51% relative gains in recall and precision). Finally, our ablation study highlights the benefit of combining keys and groundings.

* EMNLP 2022 Findings camera-ready

Via

Access Paper or Ask Questions