Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shafiq Joty

Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles

Sep 17, 2023

Kung-Hsiang Huang, Philippe Laban, Alexander R. Fabbri, Prafulla Kumar Choubey, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu

Figure 1 for Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles

Figure 2 for Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles

Figure 3 for Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles

Figure 4 for Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles

Abstract:Previous research in multi-document news summarization has typically concentrated on collating information that all sources agree upon. However, to our knowledge, the summarization of diverse information dispersed across multiple articles about an event has not been previously investigated. The latter imposes a different set of challenges for a summarization model. In this paper, we propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event. To facilitate this task, we outlined a data collection schema for identifying diverse information and curated a dataset named DiverseSumm. The dataset includes 245 news stories, with each story comprising 10 news articles and paired with a human-validated reference. Moreover, we conducted a comprehensive analysis to pinpoint the position and verbosity biases when utilizing Large Language Model (LLM)-based metrics for evaluating the coverage and faithfulness of the summaries, as well as their correlation with human assessments. We applied our findings to study how LLMs summarize multiple news articles by analyzing which type of diverse information LLMs are capable of identifying. Our analyses suggest that despite the extraordinary capabilities of LLMs in single-document summarization, the proposed task remains a complex challenge for them mainly due to their limited coverage, with GPT-4 only able to cover less than 40% of the diverse information on average.

Via

Access Paper or Ask Questions

RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair

Sep 12, 2023

Weishi Wang, Yue Wang, Shafiq Joty, Steven C. H. Hoi

Figure 1 for RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair

Figure 2 for RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair

Figure 3 for RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair

Figure 4 for RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair

Abstract:Automatic program repair (APR) is crucial to reduce manual debugging efforts for developers and improve software reliability. While conventional search-based techniques typically rely on heuristic rules or a redundancy assumption to mine fix patterns, recent years have witnessed the surge of deep learning (DL) based approaches to automate the program repair process in a data-driven manner. However, their performance is often limited by a fixed set of parameters to model the highly complex search space of APR. To ease such burden on the parametric models, in this work, we propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) by explicitly leveraging relevant fix patterns retrieved from a codebase of previous bug-fix pairs. Specifically, we build a hybrid patch retriever to account for both lexical and semantic matching based on the raw source code in a language-agnostic manner, which does not rely on any code-specific features. In addition, we adapt a code-aware language model CodeT5 as our foundation model to facilitate both patch retrieval and generation tasks in a unified manner. We adopt a stage-wise approach where the patch retriever first retrieves a relevant external bug-fix pair to augment the buggy input for the CodeT5 patch generator, which synthesizes a ranked list of repair patch candidates. Notably, RAP-Gen is a generic APR framework that can flexibly integrate different patch retrievers and generators to repair various types of bugs. We thoroughly evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java, where the bug localization information may or may not be provided. Experimental results show that RAP-Gen significantly outperforms previous state-of-the-art approaches on all benchmarks, e.g., repairing 15 more bugs on 818 Defects4J bugs.

* FSE 2023, Long paper

Via

Access Paper or Ask Questions

XGen-7B Technical Report

Sep 07, 2023

Erik Nijkamp, Tian Xie, Hiroaki Hayashi, Bo Pang, Congying Xia, Chen Xing, Jesse Vig, Semih Yavuz, Philippe Laban, Ben Krause(+15 more)

Abstract:Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many tasks that require inference over an input context. To address this, we have trained XGen, a series of 7B parameter models on up to 8K sequence length for up to 1.5T tokens. We have also finetuned the XGen models on public-domain instructional data, creating their instruction-tuned counterparts (XGen-Inst). We open-source our models for both research advancements and commercial applications. Our evaluation on standard benchmarks shows that XGen models achieve comparable or better results when compared with state-of-the-art open-source LLMs. Our targeted evaluation on long sequence modeling tasks shows the benefits of our 8K-sequence models over 2K-sequence open-source LLMs.

Via

Access Paper or Ask Questions

Exploring the Integration Strategies of Retriever and Large Language Models

Aug 24, 2023

Ye Liu, Semih Yavuz, Rui Meng, Meghana Moorthy, Shafiq Joty, Caiming Xiong, Yingbo Zhou

Figure 1 for Exploring the Integration Strategies of Retriever and Large Language Models

Figure 2 for Exploring the Integration Strategies of Retriever and Large Language Models

Figure 3 for Exploring the Integration Strategies of Retriever and Large Language Models

Figure 4 for Exploring the Integration Strategies of Retriever and Large Language Models

Abstract:The integration of retrieved passages and large language models (LLMs), such as ChatGPTs, has significantly contributed to improving open-domain question answering. However, there is still a lack of exploration regarding the optimal approach for incorporating retrieved passages into the answer generation process. This paper aims to fill this gap by investigating different methods of combining retrieved passages with LLMs to enhance answer generation. We begin by examining the limitations of a commonly-used concatenation approach. Surprisingly, this approach often results in generating "unknown" outputs, even when the correct document is among the top-k retrieved passages. To address this issue, we explore four alternative strategies for integrating the retrieved passages with the LLMs. These strategies include two single-round methods that utilize chain-of-thought reasoning and two multi-round strategies that incorporate feedback loops. Through comprehensive analyses and experiments, we provide insightful observations on how to effectively leverage retrieved passages to enhance the answer generation capability of LLMs.

Via

Access Paper or Ask Questions

PromptSum: Parameter-Efficient Controllable Abstractive Summarization

Aug 06, 2023

Mathieu Ravaut, Hailin Chen, Ruochen Zhao, Chengwei Qin, Shafiq Joty, Nancy Chen

Figure 1 for PromptSum: Parameter-Efficient Controllable Abstractive Summarization

Figure 2 for PromptSum: Parameter-Efficient Controllable Abstractive Summarization

Figure 3 for PromptSum: Parameter-Efficient Controllable Abstractive Summarization

Figure 4 for PromptSum: Parameter-Efficient Controllable Abstractive Summarization

Abstract:Prompt tuning (PT), a parameter-efficient technique that only tunes the additional prompt embeddings while keeping the backbone pre-trained language model (PLM) frozen, has shown promising results in language understanding tasks, especially in low-resource scenarios. However, effective prompt design methods suitable for generation tasks such as summarization are still lacking. At the same time, summarization guided through instructions (discrete prompts) can achieve a desirable double objective of high quality and controllability in summary generation. Towards a goal of strong summarization performance under the triple conditions of parameter-efficiency, data-efficiency, and controllability, we introduce PromptSum, a method combining PT with a multi-task objective and discrete entity prompts for abstractive summarization. Our model achieves competitive ROUGE results on popular abstractive summarization benchmarks coupled with a strong level of controllability through entities, all while only tuning several orders of magnitude less parameters.

Via

Access Paper or Ask Questions

Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

Jun 20, 2023

Xuan-Phi Nguyen, Sharifah Mahani Aljunied, Shafiq Joty, Lidong Bing

Figure 1 for Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

Figure 2 for Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

Figure 3 for Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

Figure 4 for Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

Abstract:Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars. However, in low-resource languages, obtaining such hand-picked exemplars can still be challenging, where unsupervised techniques may be necessary. Moreover, competent generative capabilities of LLMs are observed only in high-resource languages, while their performances among under-represented languages fall behind due to pre-training data imbalance. To elicit LLMs' ability onto low-resource languages without any supervised data, we propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English. These prompts are then used to create intra-lingual exemplars to perform tasks in the target languages. Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages. We also show that fine-tuning a 7B model on data generated from our method helps it perform competitively with a 175B model. In non-English translation tasks, our method even outperforms supervised prompting by up to 3 chrF++ in many low-resource languages. When evaluated on zero-shot multilingual summarization, our method surpasses other English-pivoting baselines by up to 4 ROUGE-L and is also favored by GPT-4.

* Pre-print

Via

Access Paper or Ask Questions

A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets

Jun 08, 2023

Md Tahmid Rahman Laskar, M Saiful Bari, Mizanur Rahman, Md Amran Hossen Bhuiyan, Shafiq Joty, Jimmy Xiangji Huang

Figure 1 for A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets

Figure 2 for A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets

Figure 3 for A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets

Figure 4 for A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets

Abstract:The development of large language models (LLMs) such as ChatGPT has brought a lot of attention recently. However, their evaluation in the benchmark academic datasets remains under-explored due to the difficulty of evaluating the generative outputs produced by this model against the ground truth. In this paper, we aim to present a thorough evaluation of ChatGPT's performance on diverse academic datasets, covering tasks like question-answering, text summarization, code generation, commonsense reasoning, mathematical problem-solving, machine translation, bias detection, and ethical considerations. Specifically, we evaluate ChatGPT across 140 tasks and analyze 255K responses it generates in these datasets. This makes our work the largest evaluation of ChatGPT in NLP benchmarks. In short, our study aims to validate the strengths and weaknesses of ChatGPT in various tasks and provide insights for future research using LLMs. We also report a new emergent ability to follow multi-query instructions that we mostly found in ChatGPT and other instruction-tuned models. Our extensive evaluation shows that even though ChatGPT is capable of performing a wide variety of tasks, and may obtain impressive performance in several benchmark datasets, it is still far from achieving the ability to reliably solve many challenging tasks. By providing a thorough assessment of ChatGPT's performance across diverse NLP tasks, this paper sets the stage for a targeted deployment of ChatGPT-like LLMs in real-world applications.

* Accepted by ACL 2023 Findings. The first three authors contributed equally

Via

Access Paper or Ask Questions

Did You Read the Instructions? Rethinking the Effectiveness of Task Definitions in Instruction Learning

Jun 01, 2023

Fan Yin, Jesse Vig, Philippe Laban, Shafiq Joty, Caiming Xiong, Chien-Sheng Jason Wu

Abstract:Large language models (LLMs) have shown impressive performance in following natural language instructions to solve unseen tasks. However, it remains unclear whether models truly understand task definitions and whether the human-written definitions are optimal. In this paper, we systematically study the role of task definitions in instruction learning. We first conduct an ablation analysis informed by human annotations to understand which parts of a task definition are most important, and find that model performance only drops substantially when removing contents describing the task output, in particular label information. Next, we propose an automatic algorithm to compress task definitions to a minimal supporting set of tokens, and find that 60\% of tokens can be removed while maintaining or even improving model performance. Based on these results, we propose two strategies to help models better leverage task instructions: (1) providing only key information for tasks in a common structured format, and (2) adding a meta-tuning stage to help the model better understand the definitions. With these two strategies, we achieve a 4.2 Rouge-L improvement over 119 unseen test tasks.

* ACL 2023, camera-ready; 10 pages

Via

Access Paper or Ask Questions

Building Extractive Question Answering System to Support Human-AI Health Coaching Model for Sleep Domain

May 31, 2023

Iva Bojic, Qi Chwen Ong, Shafiq Joty, Josip Car

Abstract:Non-communicable diseases (NCDs) are a leading cause of global deaths, necessitating a focus on primary prevention and lifestyle behavior change. Health coaching, coupled with Question Answering (QA) systems, has the potential to transform preventive healthcare. This paper presents a human-Artificial Intelligence (AI) health coaching model incorporating a domain-specific extractive QA system. A sleep-focused dataset, SleepQA, was manually assembled and used to fine-tune domain-specific BERT models. The QA system was evaluated using automatic and human methods. A data-centric framework enhanced the system's performance by improving passage retrieval and question reformulation. Although the system did not outperform the baseline in automatic evaluation, it excelled in the human evaluation of real-world questions. Integration into a Human-AI health coaching model was tested in a pilot Randomized Controlled Trial (RCT).

* 2 pages, 1 figure

Via

Access Paper or Ask Questions

SWiPE: A Dataset for Document-Level Simplification of Wikipedia Pages

May 30, 2023

Philippe Laban, Jesse Vig, Wojciech Kryscinski, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu

Abstract:Text simplification research has mostly focused on sentence-level simplification, even though many desirable edits - such as adding relevant background information or reordering content - may require document-level context. Prior work has also predominantly framed simplification as a single-step, input-to-output task, only implicitly modeling the fine-grained, span-level edits that elucidate the simplification process. To address both gaps, we introduce the SWiPE dataset, which reconstructs the document-level editing process from English Wikipedia (EW) articles to paired Simple Wikipedia (SEW) articles. In contrast to prior work, SWiPE leverages the entire revision history when pairing pages in order to better identify simplification edits. We work with Wikipedia editors to annotate 5,000 EW-SEW document pairs, labeling more than 40,000 edits with proposed 19 categories. To scale our efforts, we propose several models to automatically label edits, achieving an F-1 score of up to 70.6, indicating that this is a tractable but challenging NLU task. Finally, we categorize the edits produced by several simplification models and find that SWiPE-trained models generate more complex edits while reducing unwanted edits.

* ACL 2023, Long Paper

Via

Access Paper or Ask Questions