Alert button

How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?

Feb 13, 2024
Ryan Liu, Theodore R. Sumers, Ishita Dasgupta, Thomas L. Griffiths

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: