Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaodan Zhu

BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models

May 08, 2024

Chu Fei Luo, Ahmad Ghawanmeh, Xiaodan Zhu, Faiza Khan Khattak

Figure 1 for BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models

Figure 2 for BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models

Figure 3 for BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models

Figure 4 for BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models

Abstract:Modern large language models (LLMs) have a significant amount of world knowledge, which enables strong performance in commonsense reasoning and knowledge-intensive tasks when harnessed properly. The language model can also learn social biases, which has a significant potential for societal harm. There have been many mitigation strategies proposed for LLM safety, but it is unclear how effective they are for eliminating social biases. In this work, we propose a new methodology for attacking language models with knowledge graph augmented generation. We refactor natural language stereotypes into a knowledge graph, and use adversarial attacking strategies to induce biased responses from several open- and closed-source language models. We find our method increases bias in all models, even those trained with safety guardrails. This demonstrates the need for further research in AI safety, and further work in this new adversarial space.

Via

Access Paper or Ask Questions

Evaluating AI for Law: Bridging the Gap with Open-Source Solutions

Apr 18, 2024

Rohan Bhambhoria, Samuel Dahan, Jonathan Li, Xiaodan Zhu

Figure 1 for Evaluating AI for Law: Bridging the Gap with Open-Source Solutions

Figure 2 for Evaluating AI for Law: Bridging the Gap with Open-Source Solutions

Figure 3 for Evaluating AI for Law: Bridging the Gap with Open-Source Solutions

Figure 4 for Evaluating AI for Law: Bridging the Gap with Open-Source Solutions

Abstract:This study evaluates the performance of general-purpose AI, like ChatGPT, in legal question-answering tasks, highlighting significant risks to legal professionals and clients. It suggests leveraging foundational models enhanced by domain-specific knowledge to overcome these issues. The paper advocates for creating open-source legal AI systems to improve accuracy, transparency, and narrative diversity, addressing general AI's shortcomings in legal contexts.

Via

Access Paper or Ask Questions

Self-Consistent Decoding for More Factual Open Responses

Mar 01, 2024

Christopher Malon, Xiaodan Zhu

Abstract:Self-consistency has emerged as a powerful method for improving the accuracy of short answers generated by large language models. As previously defined, it only concerns the accuracy of a final answer parsed from generated text. In this work, we extend the idea to open response generation, by integrating voting into the decoding method. Each output sentence is selected from among multiple samples, conditioning on the previous selections, based on a simple token overlap score. We compare this "Sample & Select" method to greedy decoding, beam search, nucleus sampling, and the recently introduced hallucination avoiding decoders of DoLA, P-CRR, and S-CRR. We show that Sample & Select improves factuality by a 30% relative margin against these decoders in NLI-based evaluation on the subsets of CNN/DM and XSum used in the FRANK benchmark, while maintaining comparable ROUGE-1 F1 scores against reference summaries. We collect human verifications of the generated summaries, confirming the factual superiority of our method.

Via

Access Paper or Ask Questions

HGOT: Hierarchical Graph of Thoughts for Retrieval-Augmented In-Context Learning in Factuality Evaluation

Feb 14, 2024

Yihao Fang, Stephen W. Thomas, Xiaodan Zhu

Abstract:With the widespread adoption of large language models (LLMs) in numerous applications, the challenge of factuality and the propensity for hallucinations raises significant concerns. To address this issue, particularly in retrieval-augmented in-context learning, we introduce the hierarchical graph of thoughts (HGOT), a structured, multi-layered graph approach designed to enhance the retrieval of pertinent passages during in-context learning. The framework utilizes the emergent planning capabilities of LLMs, employing the divide-and-conquer strategy to break down complex queries into manageable sub-queries. It refines self-consistency majority voting for answer selection, which incorporates the recently proposed citation recall and precision metrics to assess the quality of thoughts, linking an answer's credibility intrinsically to the thought's quality. This methodology introduces a weighted system in majority voting, prioritizing answers based on the citation quality of their thoughts. Additionally, we propose a scoring mechanism for evaluating retrieved passages, considering factors such as citation frequency and quality, self-consistency confidence, and the retrieval module's ranking. Experiments reveal that HGOT outperforms other retrieval-augmented in-context learning methods, including Demonstrate-Search-Predict (DSP), ReAct, Self-Ask, and Retrieve-then-Read on different datasets by as much as $7\%$, demonstrating its efficacy in enhancing the factuality of LLMs.

Via

Access Paper or Ask Questions

Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs

Jan 18, 2024

Haritz Puerto, Martin Tutek, Somak Aditya, Xiaodan Zhu, Iryna Gurevych

Figure 1 for Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs

Figure 2 for Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs

Figure 3 for Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs

Figure 4 for Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs

Abstract:Reasoning is a fundamental component for achieving language understanding. Among the multiple types of reasoning, conditional reasoning, the ability to draw different conclusions depending on some condition, has been understudied in large language models (LLMs). Recent prompting methods, such as chain of thought, have significantly improved LLMs on reasoning tasks. Nevertheless, there is still little understanding of what triggers reasoning abilities in LLMs. We hypothesize that code prompts can trigger conditional reasoning in LLMs trained on text and code. We propose a chain of prompts that transforms a natural language problem into code and prompts the LLM with the generated code. Our experiments find that code prompts exhibit a performance boost between 2.6 and 7.7 points on GPT 3.5 across multiple datasets requiring conditional reasoning. We then conduct experiments to discover how code prompts elicit conditional reasoning abilities and through which features. We observe that prompts need to contain natural language text accompanied by high-quality code that closely represents the semantics of the instance text. Furthermore, we show that code prompts are more efficient, requiring fewer demonstrations, and that they trigger superior state tracking of variables or key entities.

* Code, prompt templates, prompts, and outputs are publicly available at https://github.com/UKPLab/arxiv2024-conditional-reasoning-llms

Via

Access Paper or Ask Questions

Calibration Attack: A Framework For Adversarial Attacks Targeting Calibration

Jan 05, 2024

Stephen Obadinma, Xiaodan Zhu, Hongyu Guo

Figure 1 for Calibration Attack: A Framework For Adversarial Attacks Targeting Calibration

Figure 2 for Calibration Attack: A Framework For Adversarial Attacks Targeting Calibration

Figure 3 for Calibration Attack: A Framework For Adversarial Attacks Targeting Calibration

Figure 4 for Calibration Attack: A Framework For Adversarial Attacks Targeting Calibration

Abstract:We introduce a new framework of adversarial attacks, named calibration attacks, in which the attacks are generated and organized to trap victim models to be miscalibrated without altering their original accuracy, hence seriously endangering the trustworthiness of the models and any decision-making based on their confidence scores. Specifically, we identify four novel forms of calibration attacks: underconfidence attacks, overconfidence attacks, maximum miscalibration attacks, and random confidence attacks, in both the black-box and white-box setups. We then test these new attacks on typical victim models with comprehensive datasets, demonstrating that even with a relatively low number of queries, the attacks can create significant calibration mistakes. We further provide detailed analyses to understand different aspects of calibration attacks. Building on that, we investigate the effectiveness of widely used adversarial defences and calibration methods against these types of attacks, which then inspires us to devise two novel defences against such calibration attacks.

Via

Access Paper or Ask Questions

Parameter-Efficient Methods for Metastases Detection from Clinical Notes

Oct 27, 2023

Maede Ashofteh Barabadi, Xiaodan Zhu, Wai Yip Chan, Amber L. Simpson, Richard K. G. Do

Figure 1 for Parameter-Efficient Methods for Metastases Detection from Clinical Notes

Figure 2 for Parameter-Efficient Methods for Metastases Detection from Clinical Notes

Abstract:Understanding the progression of cancer is crucial for defining treatments for patients. The objective of this study is to automate the detection of metastatic liver disease from free-style computed tomography (CT) radiology reports. Our research demonstrates that transferring knowledge using three approaches can improve model performance. First, we utilize generic language models (LMs), pretrained in a self-supervised manner. Second, we use a semi-supervised approach to train our model by automatically annotating a large unlabeled dataset; this approach substantially enhances the model's performance. Finally, we transfer knowledge from related tasks by designing a multi-task transfer learning methodology. We leverage the recent advancement of parameter-efficient LM adaptation strategies to improve performance and training efficiency. Our dataset consists of CT reports collected at Memorial Sloan Kettering Cancer Center (MSKCC) over the course of 12 years. 2,641 reports were manually annotated by domain experts; among them, 841 reports have been annotated for the presence of liver metastases. Our best model achieved an F1-score of 73.8%, a precision of 84%, and a recall of 65.8%.

* Barabadi, M. A., Zhu, X., Chan, W. Y., Simpson, A. L., & Do, R. K. G. (2023). Parameter-Efficient Methods for Metastases Detection fromClinical Notes. Proceedings of the Canadian Conference on Artificial Intelligence
* 6 pages, 1 figure, The 36th Canadian Conference on Artificial Intelligence

Via

Access Paper or Ask Questions

Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

Oct 12, 2023

Ethan Callanan, Amarachi Mbakwe, Antony Papadimitriou, Yulong Pei, Mathieu Sibue, Xiaodan Zhu, Zhiqiang Ma, Xiaomo Liu, Sameena Shah

Figure 1 for Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

Figure 2 for Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

Figure 3 for Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

Figure 4 for Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

Abstract:Large Language Models (LLMs) have demonstrated remarkable performance on a wide range of Natural Language Processing (NLP) tasks, often matching or even beating state-of-the-art task-specific models. This study aims at assessing the financial reasoning capabilities of LLMs. We leverage mock exam questions of the Chartered Financial Analyst (CFA) Program to conduct a comprehensive evaluation of ChatGPT and GPT-4 in financial analysis, considering Zero-Shot (ZS), Chain-of-Thought (CoT), and Few-Shot (FS) scenarios. We present an in-depth analysis of the models' performance and limitations, and estimate whether they would have a chance at passing the CFA exams. Finally, we outline insights into potential strategies and improvements to enhance the applicability of LLMs in finance. In this perspective, we hope this work paves the way for future studies to continue enhancing LLMs for financial reasoning through rigorous evaluation.

Via

Access Paper or Ask Questions

Ensemble Distillation for Unsupervised Constituency Parsing

Oct 03, 2023

Behzad Shayegh, Yanshuai Cao, Xiaodan Zhu, Jackie C. K. Cheung, Lili Mou

Abstract:We investigate the unsupervised constituency parsing task, which organizes words and phrases of a sentence into a hierarchical structure without using linguistically annotated data. We observe that existing unsupervised parsers capture differing aspects of parsing structures, which can be leveraged to enhance unsupervised parsing performance. To this end, we propose a notion of "tree averaging," based on which we further propose a novel ensemble method for unsupervised parsing. To improve inference efficiency, we further distill the ensemble knowledge into a student model; such an ensemble-then-distill process is an effective approach to mitigate the over-smoothing problem existing in common multi-teacher distilling methods. Experiments show that our method surpasses all previous approaches, consistently demonstrating its effectiveness and robustness across various runs, with different ensemble components, and under domain-shift conditions.

Via

Access Paper or Ask Questions

ChatGPT as Data Augmentation for Compositional Generalization: A Case Study in Open Intent Detection

Aug 25, 2023

Yihao Fang, Xianzhi Li, Stephen W. Thomas, Xiaodan Zhu

Abstract:Open intent detection, a crucial aspect of natural language understanding, involves the identification of previously unseen intents in user-generated text. Despite the progress made in this field, challenges persist in handling new combinations of language components, which is essential for compositional generalization. In this paper, we present a case study exploring the use of ChatGPT as a data augmentation technique to enhance compositional generalization in open intent detection tasks. We begin by discussing the limitations of existing benchmarks in evaluating this problem, highlighting the need for constructing datasets for addressing compositional generalization in open intent detection tasks. By incorporating synthetic data generated by ChatGPT into the training process, we demonstrate that our approach can effectively improve model performance. Rigorous evaluation of multiple benchmarks reveals that our method outperforms existing techniques and significantly enhances open intent detection capabilities. Our findings underscore the potential of large language models like ChatGPT for data augmentation in natural language understanding tasks.

* Proceedings of the Joint Workshop of the 5th Financial Technology and Natural Language Processing (FinNLP) and 2nd Multimodal AI For Financial Forecasting (Muffin), Macao, August 20, 2023

Via

Access Paper or Ask Questions