Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

Feb 08, 2024

Chirag Agarwal, Sree Harsha Tanneru, Himabindu Lakkaraju

Figure 1 for Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

Figure 2 for Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

Figure 3 for Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

Figure 4 for Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

Share this with someone who'll enjoy it:

Abstract:Large Language Models (LLMs) are deployed as powerful tools for several natural language processing (NLP) applications. Recent works show that modern LLMs can generate self-explanations (SEs), which elicit their intermediate reasoning steps for explaining their behavior. Self-explanations have seen widespread adoption owing to their conversational and plausible nature. However, there is little to no understanding of their faithfulness. In this work, we discuss the dichotomy between faithfulness and plausibility in SEs generated by LLMs. We argue that while LLMs are adept at generating plausible explanations -- seemingly logical and coherent to human users -- these explanations do not necessarily align with the reasoning processes of the LLMs, raising concerns about their faithfulness. We highlight that the current trend towards increasing the plausibility of explanations, primarily driven by the demand for user-friendly interfaces, may come at the cost of diminishing their faithfulness. We assert that the faithfulness of explanations is critical in LLMs employed for high-stakes decision-making. Moreover, we urge the community to identify the faithfulness requirements of real-world applications and ensure explanations meet those needs. Finally, we propose some directions for future work, emphasizing the need for novel methodologies and frameworks that can enhance the faithfulness of self-explanations without compromising their plausibility, essential for the transparent deployment of LLMs in diverse high-stakes domains.

View paper on

Share this with someone who'll enjoy it:

Title:Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

Paper and Code