Picture for Peter Hase

Peter Hase

Counterfactual Simulation Training for Chain-of-Thought Faithfulness

Add code
Feb 24, 2026
Viaarxiv icon

The Truthfulness Spectrum Hypothesis

Add code
Feb 23, 2026
Viaarxiv icon

Unsupervised Elicitation of Language Models

Add code
Jun 11, 2025
Figure 1 for Unsupervised Elicitation of Language Models
Figure 2 for Unsupervised Elicitation of Language Models
Figure 3 for Unsupervised Elicitation of Language Models
Figure 4 for Unsupervised Elicitation of Language Models
Viaarxiv icon

Reasoning Models Don't Always Say What They Think

Add code
May 08, 2025
Figure 1 for Reasoning Models Don't Always Say What They Think
Figure 2 for Reasoning Models Don't Always Say What They Think
Figure 3 for Reasoning Models Don't Always Say What They Think
Figure 4 for Reasoning Models Don't Always Say What They Think
Viaarxiv icon

Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation

Add code
May 01, 2025
Figure 1 for Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation
Figure 2 for Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation
Figure 3 for Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation
Figure 4 for Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation
Viaarxiv icon

Teaching Models to Balance Resisting and Accepting Persuasion

Add code
Oct 18, 2024
Figure 1 for Teaching Models to Balance Resisting and Accepting Persuasion
Figure 2 for Teaching Models to Balance Resisting and Accepting Persuasion
Figure 3 for Teaching Models to Balance Resisting and Accepting Persuasion
Figure 4 for Teaching Models to Balance Resisting and Accepting Persuasion
Viaarxiv icon

System-1.x: Learning to Balance Fast and Slow Planning with Language Models

Add code
Jul 19, 2024
Figure 1 for System-1.x: Learning to Balance Fast and Slow Planning with Language Models
Figure 2 for System-1.x: Learning to Balance Fast and Slow Planning with Language Models
Figure 3 for System-1.x: Learning to Balance Fast and Slow Planning with Language Models
Figure 4 for System-1.x: Learning to Balance Fast and Slow Planning with Language Models
Viaarxiv icon

Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?

Add code
Jun 27, 2024
Figure 1 for Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?
Figure 2 for Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?
Figure 3 for Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?
Figure 4 for Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?
Viaarxiv icon

Are language models rational? The case of coherence norms and belief revision

Add code
Jun 05, 2024
Viaarxiv icon

LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models

Add code
May 31, 2024
Figure 1 for LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models
Figure 2 for LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models
Figure 3 for LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models
Figure 4 for LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models
Viaarxiv icon