Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lu Cheng

Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology

Feb 24, 2025

Longchao Da, Xiaoou Liu, Jiaxin Dai, Lu Cheng, Yaqing Wang, Hua Wei

Abstract:Understanding the uncertainty in large language model (LLM) explanations is important for evaluating their faithfulness and reasoning consistency, and thus provides insights into the reliability of LLM's output regarding a question. In this work, we propose a novel framework that quantifies uncertainty in LLM explanations through a reasoning topology perspective. By designing a structural elicitation strategy, we guide the LLMs to frame the explanations of an answer into a graph topology. This process decomposes the explanations into the knowledge related sub-questions and topology-based reasoning structures, which allows us to quantify uncertainty not only at the semantic level but also from the reasoning path. It further brings convenience to assess knowledge redundancy and provide interpretable insights into the reasoning process. Our method offers a systematic way to interpret the LLM reasoning, analyze limitations, and provide guidance for enhancing robustness and faithfulness. This work pioneers the use of graph-structured uncertainty measurement in LLM explanations and demonstrates the potential of topology-based quantification.

* 15 pages, 6 figures

Via

Access Paper or Ask Questions

COPU: Conformal Prediction for Uncertainty Quantification in Natural Language Generation

Feb 18, 2025

Sean Wang, Yicheng Jiang, Yuxin Tang, Lu Cheng, Hanjie Chen

Figure 1 for COPU: Conformal Prediction for Uncertainty Quantification in Natural Language Generation

Figure 2 for COPU: Conformal Prediction for Uncertainty Quantification in Natural Language Generation

Figure 3 for COPU: Conformal Prediction for Uncertainty Quantification in Natural Language Generation

Figure 4 for COPU: Conformal Prediction for Uncertainty Quantification in Natural Language Generation

Abstract:Uncertainty Quantification (UQ) for Natural Language Generation (NLG) is crucial for assessing the performance of Large Language Models (LLMs), as it reveals confidence in predictions, identifies failure modes, and gauges output reliability. Conformal Prediction (CP), a model-agnostic method that generates prediction sets with a specified error rate, has been adopted for UQ in classification tasks, where the size of the prediction set indicates the model's uncertainty. However, when adapting CP to NLG, the sampling-based method for generating candidate outputs cannot guarantee the inclusion of the ground truth, limiting its applicability across a wide range of error rates. To address this, we propose \ourmethod, a method that explicitly adds the ground truth to the candidate outputs and uses logit scores to measure nonconformity. Our experiments with six LLMs on four NLG tasks show that \ourmethod outperforms baseline methods in calibrating error rates and empirical cover rates, offering accurate UQ across a wide range of user-specified error rates.

Via

Access Paper or Ask Questions

Large Vision-Language Model Alignment and Misalignment: A Survey Through the Lens of Explainability

Jan 02, 2025

Dong Shu, Haiyan Zhao, Jingyu Hu, Weiru Liu, Lu Cheng, Mengnan Du

Abstract:Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in processing both visual and textual information. However, the critical challenge of alignment between visual and linguistic representations is not fully understood. This survey presents a comprehensive examination of alignment and misalignment in LVLMs through an explainability lens. We first examine the fundamentals of alignment, exploring its representational and behavioral aspects, training methodologies, and theoretical foundations. We then analyze misalignment phenomena across three semantic levels: object, attribute, and relational misalignment. Our investigation reveals that misalignment emerges from challenges at multiple levels: the data level, the model level, and the inference level. We provide a comprehensive review of existing mitigation strategies, categorizing them into parameter-frozen and parameter-tuning approaches. Finally, we outline promising future research directions, emphasizing the need for standardized evaluation protocols and in-depth explainability studies.

* 16 pages, 3 figures

Via

Access Paper or Ask Questions

Unveiling Performance Challenges of Large Language Models in Low-Resource Healthcare: A Demographic Fairness Perspective

Nov 30, 2024

Yue Zhou, Barbara Di Eugenio, Lu Cheng

Figure 1 for Unveiling Performance Challenges of Large Language Models in Low-Resource Healthcare: A Demographic Fairness Perspective

Figure 2 for Unveiling Performance Challenges of Large Language Models in Low-Resource Healthcare: A Demographic Fairness Perspective

Figure 3 for Unveiling Performance Challenges of Large Language Models in Low-Resource Healthcare: A Demographic Fairness Perspective

Figure 4 for Unveiling Performance Challenges of Large Language Models in Low-Resource Healthcare: A Demographic Fairness Perspective

Abstract:This paper studies the performance of large language models (LLMs), particularly regarding demographic fairness, in solving real-world healthcare tasks. We evaluate state-of-the-art LLMs with three prevalent learning frameworks across six diverse healthcare tasks and find significant challenges in applying LLMs to real-world healthcare tasks and persistent fairness issues across demographic groups. We also find that explicitly providing demographic information yields mixed results, while LLM's ability to infer such details raises concerns about biased health predictions. Utilizing LLMs as autonomous agents with access to up-to-date guidelines does not guarantee performance improvement. We believe these findings reveal the critical limitations of LLMs in healthcare fairness and the urgent need for specialized research in this area.

* Accepted to the main conference of COLING 2025

Via

Access Paper or Ask Questions

From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

Nov 25, 2024

Dawei Li, Bohan Jiang, Liangjie Huang, Alimohammad Beigi, Chengshuai Zhao, Zhen Tan, Amrita Bhattacharjee, Yuxuan Jiang, Canyu Chen, Tianhao Wu(+3 more)

Figure 1 for From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

Figure 2 for From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

Figure 3 for From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

Figure 4 for From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

Abstract:Assessment and evaluation have long been critical challenges in artificial intelligence (AI) and natural language processing (NLP). However, traditional methods, whether matching-based or embedding-based, often fall short of judging subtle attributes and delivering satisfactory results. Recent advancements in Large Language Models (LLMs) inspire the "LLM-as-a-judge" paradigm, where LLMs are leveraged to perform scoring, ranking, or selection across various tasks and applications. This paper provides a comprehensive survey of LLM-based judgment and assessment, offering an in-depth overview to advance this emerging field. We begin by giving detailed definitions from both input and output perspectives. Then we introduce a comprehensive taxonomy to explore LLM-as-a-judge from three dimensions: what to judge, how to judge and where to judge. Finally, we compile benchmarks for evaluating LLM-as-a-judge and highlight key challenges and promising directions, aiming to provide valuable insights and inspire future research in this promising research area. Paper list and more resources about LLM-as-a-judge can be found at \url{https://github.com/llm-as-a-judge/Awesome-LLM-as-a-judge} and \url{https://llm-as-a-judge.github.io}.

* 32 pages, 5 figures

Via

Access Paper or Ask Questions

Evaluating LLMs Capabilities Towards Understanding Social Dynamics

Nov 20, 2024

Anique Tahir, Lu Cheng, Manuel Sandoval, Yasin N. Silva, Deborah L. Hall, Huan Liu

Figure 1 for Evaluating LLMs Capabilities Towards Understanding Social Dynamics

Figure 2 for Evaluating LLMs Capabilities Towards Understanding Social Dynamics

Figure 3 for Evaluating LLMs Capabilities Towards Understanding Social Dynamics

Figure 4 for Evaluating LLMs Capabilities Towards Understanding Social Dynamics

Abstract:Social media discourse involves people from different backgrounds, beliefs, and motives. Thus, often such discourse can devolve into toxic interactions. Generative Models, such as Llama and ChatGPT, have recently exploded in popularity due to their capabilities in zero-shot question-answering. Because these models are increasingly being used to ask questions of social significance, a crucial research question is whether they can understand social media dynamics. This work provides a critical analysis regarding generative LLM's ability to understand language and dynamics in social contexts, particularly considering cyberbullying and anti-cyberbullying (posts aimed at reducing cyberbullying) interactions. Specifically, we compare and contrast the capabilities of different large language models (LLMs) to understand three key aspects of social dynamics: language, directionality, and the occurrence of bullying/anti-bullying messages. We found that while fine-tuned LLMs exhibit promising results in some social media understanding tasks (understanding directionality), they presented mixed results in others (proper paraphrasing and bullying/anti-bullying detection). We also found that fine-tuning and prompt engineering mechanisms can have positive effects in some tasks. We believe that a understanding of LLM's capabilities is crucial to design future models that can be effectively used in social applications.

* To appear in ASONAM 24 proceedings

Via

Access Paper or Ask Questions

Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML

Nov 19, 2024

Prakhar Ganesh, Usman Gohar, Lu Cheng, Golnoosh Farnadi

Figure 1 for Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML

Figure 2 for Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML

Figure 3 for Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML

Figure 4 for Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML

Abstract:With fairness concerns gaining significant attention in Machine Learning (ML), several bias mitigation techniques have been proposed, often compared against each other to find the best method. These benchmarking efforts tend to use a common setup for evaluation under the assumption that providing a uniform environment ensures a fair comparison. However, bias mitigation techniques are sensitive to hyperparameter choices, random seeds, feature selection, etc., meaning that comparison on just one setting can unfairly favour certain algorithms. In this work, we show significant variance in fairness achieved by several algorithms and the influence of the learning pipeline on fairness scores. We highlight that most bias mitigation techniques can achieve comparable performance, given the freedom to perform hyperparameter optimization, suggesting that the choice of the evaluation parameters-rather than the mitigation technique itself-can sometimes create the perceived superiority of one method over another. We hope our work encourages future research on how various choices in the lifecycle of developing an algorithm impact fairness, and trends that guide the selection of appropriate algorithms.

* To appear at AFME@NeurIPS 2024

Via

Access Paper or Ask Questions

Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective

Oct 11, 2024

Bo Ni, Yu Wang, Lu Cheng, Erik Blasch, Tyler Derr

Abstract:Recently, Knowledge Graphs (KGs) have been successfully coupled with Large Language Models (LLMs) to mitigate their hallucinations and enhance their reasoning capability, such as in KG-based retrieval-augmented frameworks. However, current KG-LLM frameworks lack rigorous uncertainty estimation, limiting their reliable deployment in high-stakes applications. Directly incorporating uncertainty quantification into KG-LLM frameworks presents challenges due to their complex architectures and the intricate interactions between the knowledge graph and language model components. To address this gap, we propose a new trustworthy KG-LLM framework, Uncertainty Aware Knowledge-Graph Reasoning (UAG), which incorporates uncertainty quantification into the KG-LLM framework. We design an uncertainty-aware multi-step reasoning framework that leverages conformal prediction to provide a theoretical guarantee on the prediction set. To manage the error rate of the multi-step process, we additionally introduce an error rate control module to adjust the error rate within the individual components. Extensive experiments show that our proposed UAG can achieve any pre-defined coverage rate while reducing the prediction set/interval size by 40% on average over the baselines.

Via

Access Paper or Ask Questions

DemoShapley: Valuation of Demonstrations for In-Context Learning

Oct 10, 2024

Shan Xie, Man Luo, Chadly Daniel Stern, Mengnan Du, Lu Cheng

Figure 1 for DemoShapley: Valuation of Demonstrations for In-Context Learning

Figure 2 for DemoShapley: Valuation of Demonstrations for In-Context Learning

Figure 3 for DemoShapley: Valuation of Demonstrations for In-Context Learning

Figure 4 for DemoShapley: Valuation of Demonstrations for In-Context Learning

Abstract:Large language models (LLMs) leveraging in-context learning (ICL) have set new benchmarks in few-shot learning across various tasks without needing task-specific fine-tuning. However, extensive research has demonstrated that the effectiveness of ICL is significantly influenced by the selection and ordering of demonstrations. Considering the critical role of demonstration selection in ICL, we introduce DemoShapley which is inspired by the Data Shapley valuation theorem. This approach assesses the influence of individual demonstration instances, distinguishing between those that contribute positively and those that may hinder performance. Our findings reveal that DemoShapley not only enhances model performance in terms of accuracy and fairness but also generalizes queries from domains distinct from those of the in-context demonstrations, highlighting its versatility and effectiveness in optimizing ICL demonstration selection. Last but not least, DemoShapley demonstrates its ability to aid in identifying noisy data within the demonstration set.

Via

Access Paper or Ask Questions

Conformal Prediction: A Data Perspective

Oct 09, 2024

Xiaofan Zhou, Baiting Chen, Yu Gui, Lu Cheng

Figure 1 for Conformal Prediction: A Data Perspective

Figure 2 for Conformal Prediction: A Data Perspective

Figure 3 for Conformal Prediction: A Data Perspective

Figure 4 for Conformal Prediction: A Data Perspective

Abstract:Conformal prediction (CP), a distribution-free uncertainty quantification (UQ) framework, reliably provides valid predictive inference for black-box models. CP constructs prediction sets that contain the true output with a specified probability. However, modern data science diverse modalities, along with increasing data and model complexity, challenge traditional CP methods. These developments have spurred novel approaches to address evolving scenarios. This survey reviews the foundational concepts of CP and recent advancements from a data-centric perspective, including applications to structured, unstructured, and dynamic data. We also discuss the challenges and opportunities CP faces in large-scale data and models.

* 35 pages, journal, survey

Via

Access Paper or Ask Questions