Picture for Kshitij Sachan

Kshitij Sachan

Debating with More Persuasive LLMs Leads to More Truthful Answers

Add code
Feb 15, 2024
Figure 1 for Debating with More Persuasive LLMs Leads to More Truthful Answers
Figure 2 for Debating with More Persuasive LLMs Leads to More Truthful Answers
Figure 3 for Debating with More Persuasive LLMs Leads to More Truthful Answers
Figure 4 for Debating with More Persuasive LLMs Leads to More Truthful Answers
Viaarxiv icon

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Add code
Jan 17, 2024
Viaarxiv icon

AI Control: Improving Safety Despite Intentional Subversion

Add code
Dec 14, 2023
Figure 1 for AI Control: Improving Safety Despite Intentional Subversion
Figure 2 for AI Control: Improving Safety Despite Intentional Subversion
Figure 3 for AI Control: Improving Safety Despite Intentional Subversion
Figure 4 for AI Control: Improving Safety Despite Intentional Subversion
Viaarxiv icon

Polysemanticity and Capacity in Neural Networks

Add code
Oct 04, 2022
Figure 1 for Polysemanticity and Capacity in Neural Networks
Figure 2 for Polysemanticity and Capacity in Neural Networks
Figure 3 for Polysemanticity and Capacity in Neural Networks
Figure 4 for Polysemanticity and Capacity in Neural Networks
Viaarxiv icon