Alert button

Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?

Add code
Bookmark button
Alert button
Nov 27, 2023
Kevin Liu, Stephen Casper, Dylan Hadfield-Menell, Jacob Andreas

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: