Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaodan Zhu

ChatGPT as Data Augmentation for Compositional Generalization: A Case Study in Open Intent Detection

Aug 25, 2023

Yihao Fang, Xianzhi Li, Stephen W. Thomas, Xiaodan Zhu

Abstract:Open intent detection, a crucial aspect of natural language understanding, involves the identification of previously unseen intents in user-generated text. Despite the progress made in this field, challenges persist in handling new combinations of language components, which is essential for compositional generalization. In this paper, we present a case study exploring the use of ChatGPT as a data augmentation technique to enhance compositional generalization in open intent detection tasks. We begin by discussing the limitations of existing benchmarks in evaluating this problem, highlighting the need for constructing datasets for addressing compositional generalization in open intent detection tasks. By incorporating synthetic data generated by ChatGPT into the training process, we demonstrate that our approach can effectively improve model performance. Rigorous evaluation of multiple benchmarks reveals that our method outperforms existing techniques and significantly enhances open intent detection capabilities. Our findings underscore the potential of large language models like ChatGPT for data augmentation in natural language understanding tasks.

* Proceedings of the Joint Workshop of the 5th Financial Technology and Natural Language Processing (FinNLP) and 2nd Multimodal AI For Financial Forecasting (Muffin), Macao, August 20, 2023

Via

Access Paper or Ask Questions

NatLogAttack: A Framework for Attacking Natural Language Inference Models with Natural Logic

Jul 06, 2023

Zi'ou Zheng, Xiaodan Zhu

Figure 1 for NatLogAttack: A Framework for Attacking Natural Language Inference Models with Natural Logic

Figure 2 for NatLogAttack: A Framework for Attacking Natural Language Inference Models with Natural Logic

Figure 3 for NatLogAttack: A Framework for Attacking Natural Language Inference Models with Natural Logic

Figure 4 for NatLogAttack: A Framework for Attacking Natural Language Inference Models with Natural Logic

Abstract:Reasoning has been a central topic in artificial intelligence from the beginning. The recent progress made on distributed representation and neural networks continues to improve the state-of-the-art performance of natural language inference. However, it remains an open question whether the models perform real reasoning to reach their conclusions or rely on spurious correlations. Adversarial attacks have proven to be an important tool to help evaluate the Achilles' heel of the victim models. In this study, we explore the fundamental problem of developing attack models based on logic formalism. We propose NatLogAttack to perform systematic attacks centring around natural logic, a classical logic formalism that is traceable back to Aristotle's syllogism and has been closely developed for natural language inference. The proposed framework renders both label-preserving and label-flipping attacks. We show that compared to the existing attack models, NatLogAttack generates better adversarial examples with fewer visits to the victim models. The victim models are found to be more vulnerable under the label-flipping setting. NatLogAttack provides a tool to probe the existing and future NLI models' capacity from a key viewpoint and we hope more logic-based attacks will be further explored for understanding the desired property of reasoning.

* Published as a conference paper at ACL 2023

Via

Access Paper or Ask Questions

A Simple and Effective Framework for Strict Zero-Shot Hierarchical Classification

May 26, 2023

Rohan Bhambhoria, Lei Chen, Xiaodan Zhu

Figure 1 for A Simple and Effective Framework for Strict Zero-Shot Hierarchical Classification

Figure 2 for A Simple and Effective Framework for Strict Zero-Shot Hierarchical Classification

Figure 3 for A Simple and Effective Framework for Strict Zero-Shot Hierarchical Classification

Figure 4 for A Simple and Effective Framework for Strict Zero-Shot Hierarchical Classification

Abstract:In recent years, large language models (LLMs) have achieved strong performance on benchmark tasks, especially in zero or few-shot settings. However, these benchmarks often do not adequately address the challenges posed in the real-world, such as that of hierarchical classification. In order to address this challenge, we propose refactoring conventional tasks on hierarchical datasets into a more indicative long-tail prediction task. We observe LLMs are more prone to failure in these cases. To address these limitations, we propose the use of entailment-contradiction prediction in conjunction with LLMs, which allows for strong performance in a strict zero-shot setting. Importantly, our method does not require any parameter updates, a resource-intensive process and achieves strong performance across multiple datasets.

* Accepted at ACL 2023

Via

Access Paper or Ask Questions

Prototype-Based Interpretability for Legal Citation Prediction

May 25, 2023

Chu Fei Luo, Rohan Bhambhoria, Samuel Dahan, Xiaodan Zhu

Figure 1 for Prototype-Based Interpretability for Legal Citation Prediction

Figure 2 for Prototype-Based Interpretability for Legal Citation Prediction

Figure 3 for Prototype-Based Interpretability for Legal Citation Prediction

Figure 4 for Prototype-Based Interpretability for Legal Citation Prediction

Abstract:Deep learning has made significant progress in the past decade, and demonstrates potential to solve problems with extensive social impact. In high-stakes decision making areas such as law, experts often require interpretability for automatic systems to be utilized in practical settings. In this work, we attempt to address these requirements applied to the important problem of legal citation prediction (LCP). We design the task with parallels to the thought-process of lawyers, i.e., with reference to both precedents and legislative provisions. After initial experimental results, we refine the target citation predictions with the feedback of legal experts. Additionally, we introduce a prototype architecture to add interpretability, achieving strong performance while adhering to decision parameters used by lawyers. Our study builds on and leverages the state-of-the-art language processing models for law, while addressing vital considerations for high-stakes tasks with practical societal impact.

* 8.5 pages, 4 figures. To be published in Findings of ACL 2023

Via

Access Paper or Ask Questions

Prefix Propagation: Parameter-Efficient Tuning for Long Sequences

May 24, 2023

Jonathan Li, Will Aitken, Rohan Bhambhoria, Xiaodan Zhu

Figure 1 for Prefix Propagation: Parameter-Efficient Tuning for Long Sequences

Figure 2 for Prefix Propagation: Parameter-Efficient Tuning for Long Sequences

Figure 3 for Prefix Propagation: Parameter-Efficient Tuning for Long Sequences

Figure 4 for Prefix Propagation: Parameter-Efficient Tuning for Long Sequences

Abstract:Parameter-efficient tuning aims to mitigate the large memory requirements of adapting pretrained language models for downstream tasks. For example, one popular method, prefix-tuning, prepends trainable tokens to sequences while freezing the rest of the model's parameters. Although such models attain comparable performance with fine-tuning when applied to sequences with short to moderate lengths, we show their inferior performance when modelling long sequences. To bridge this gap, we propose prefix-propagation, a simple but effective approach that conditions prefixes on previous hidden states. We empirically demonstrate that prefix-propagation outperforms prefix-tuning across long-document tasks, while using 50% fewer parameters. To further investigate the proposed architecture, we also show its advantage in calibration, and perform additional study on its relationship with kernel attention. To the best of our knowledge, this work is the first to focus on parameter-efficient learning for long-sequence language tasks.

* ACL 2023 Main Conference

Via

Access Paper or Ask Questions

Towards Legally Enforceable Hate Speech Detection for Public Forums

May 23, 2023

Chu Fei Luo, Rohan Bhambhoria, Xiaodan Zhu, Samuel Dahan

Figure 1 for Towards Legally Enforceable Hate Speech Detection for Public Forums

Figure 2 for Towards Legally Enforceable Hate Speech Detection for Public Forums

Figure 3 for Towards Legally Enforceable Hate Speech Detection for Public Forums

Figure 4 for Towards Legally Enforceable Hate Speech Detection for Public Forums

Abstract:Hate speech is a serious issue on public forums, and proper enforcement of hate speech laws is key for protecting groups of people against harmful and discriminatory language. However, determining what constitutes hate speech is a complex task that is highly open to subjective interpretations. Existing works do not align their systems with enforceable definitions of hate speech, which can make their outputs inconsistent with the goals of regulators. Our work introduces a new task for enforceable hate speech detection centred around legal definitions, and a dataset annotated on violations of eleven possible definitions by legal experts. Given the challenge of identifying clear, legally enforceable instances of hate speech, we augment the dataset with expert-generated samples and an automatically mined challenge set. We experiment with grounding the model decision in these definitions using zero-shot and few-shot prompting. We then report results on several large language models (LLMs). With this task definition, automatic hate speech detection can be more closely aligned to enforceable laws, and hence assist in more rigorous enforcement of legal protections against harmful speech in public forums.

* 4 pages

Via

Access Paper or Ask Questions

Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks

May 10, 2023

Xianzhi Li, Xiaodan Zhu, Zhiqiang Ma, Xiaomo Liu, Sameena Shah

Abstract:The most recent large language models such as ChatGPT and GPT-4 have garnered significant attention, as they are capable of generating high-quality responses to human input. Despite the extensive testing of ChatGPT and GPT-4 on generic text corpora, showcasing their impressive capabilities, a study focusing on financial corpora has not been conducted. In this study, we aim to bridge this gap by examining the potential of ChatGPT and GPT-4 as a solver for typical financial text analytic problems in the zero-shot or few-shot setting. Specifically, we assess their capabilities on four representative tasks over five distinct financial textual datasets. The preliminary study shows that ChatGPT and GPT-4 struggle on tasks such as financial named entity recognition (NER) and sentiment analysis, where domain-specific knowledge is required, while they excel in numerical reasoning tasks. We report both the strengths and limitations of the current versions of ChatGPT and GPT-4, comparing them to the state-of-the-art finetuned models as well as pretrained domain-specific generative models. Our experiments provide qualitative studies, through which we hope to help understand the capability of the existing models and facilitate further improvements.

* 9 pages, 5 figures

Via

Access Paper or Ask Questions

Effectiveness of Data Augmentation for Prefix Tuning with Limited Data

Mar 05, 2023

Stephen Obadinma, Hongyu Guo, Xiaodan Zhu

Figure 1 for Effectiveness of Data Augmentation for Prefix Tuning with Limited Data

Figure 2 for Effectiveness of Data Augmentation for Prefix Tuning with Limited Data

Figure 3 for Effectiveness of Data Augmentation for Prefix Tuning with Limited Data

Figure 4 for Effectiveness of Data Augmentation for Prefix Tuning with Limited Data

Abstract:Recent work has demonstrated that tuning continuous prompts on large, frozen pretrained language models (i.e., prefix tuning or P-tuning) can yield performance that is comparable or superior to fine-tuning. Nevertheless, the effectiveness of such methods under the context of data augmentation, which has been considered a common strategy to improve learning under low data regimes, has not be studied. In this paper, we examine several popular task-agnostic data augmentation techniques, i.e., EDA, Back Translation, and Mixup, when using prefix tuning under data scarcity. We show that data augmentation can be used to boost the performance of prefix tuning models, but the effectiveness of each technique varies and certain methods can lead to a notable degradation in performance, particularly when using larger models and on harder tasks. To help understand the above behaviour, we run experiments which reveal how prefix tuning generally presents a limited ability to separate the sentence embeddings from different classes of augmented data, and displays poorer performance on heavily altered data in particular. We also demonstrate that by adding a simple contrastive loss we can help mitigate such issues for prefix tuning, resulting in an improvement to augmented data performance.

Via

Access Paper or Ask Questions

Bringing the State-of-the-Art to Customers: A Neural Agent Assistant Framework for Customer Service Support

Feb 07, 2023

Stephen Obadinma, Faiza Khan Khattak, Shirley Wang, Tania Sidhom, Elaine Lau, Sean Robertson, Jingcheng Niu, Winnie Au, Alif Munim, Karthik Raja K. Bhaskar(+16 more)

Figure 1 for Bringing the State-of-the-Art to Customers: A Neural Agent Assistant Framework for Customer Service Support

Figure 2 for Bringing the State-of-the-Art to Customers: A Neural Agent Assistant Framework for Customer Service Support

Figure 3 for Bringing the State-of-the-Art to Customers: A Neural Agent Assistant Framework for Customer Service Support

Figure 4 for Bringing the State-of-the-Art to Customers: A Neural Agent Assistant Framework for Customer Service Support

Abstract:Building Agent Assistants that can help improve customer service support requires inputs from industry users and their customers, as well as knowledge about state-of-the-art Natural Language Processing (NLP) technology. We combine expertise from academia and industry to bridge the gap and build task/domain-specific Neural Agent Assistants (NAA) with three high-level components for: (1) Intent Identification, (2) Context Retrieval, and (3) Response Generation. In this paper, we outline the pipeline of the NAA's core system and also present three case studies in which three industry partners successfully adapt the framework to find solutions to their unique challenges. Our findings suggest that a collaborative process is instrumental in spurring the development of emerging NLP models for Conversational AI tasks in industry. The full reference implementation code and results are available at \url{https://github.com/VectorInstitute/NAA}

* Camera Ready Version of Paper Published in EMNLP 2022 Industry Track

Via

Access Paper or Ask Questions

Parameter-Efficient Legal Domain Adaptation

Nov 04, 2022

Jonathan Li, Rohan Bhambhoria, Xiaodan Zhu

Abstract:Seeking legal advice is often expensive. Recent advancements in machine learning for solving complex problems can be leveraged to help make legal services more accessible to the public. However, real-life applications encounter significant challenges. State-of-the-art language models are growing increasingly large, making parameter-efficient learning increasingly important. Unfortunately, parameter-efficient methods perform poorly with small amounts of data, which are common in the legal domain (where data labelling costs are high). To address these challenges, we propose parameter-efficient legal domain adaptation, which uses vast unsupervised legal data from public legal forums to perform legal pre-training. This method exceeds or matches the fewshot performance of existing models such as LEGAL-BERT on various legal tasks while tuning only approximately 0.1% of model parameters. Additionally, we show that our method can achieve calibration comparable to existing methods across several tasks. To the best of our knowledge, this work is among the first to explore parameter-efficient methods of tuning language models in the legal domain.

* Accepted into the 2022 NLLP workshop

Via

Access Paper or Ask Questions