Picture for Anca Dragan

Anca Dragan

The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality

Add code
Dec 11, 2025
Viaarxiv icon

CTRL-Rec: Controlling Recommender Systems With Natural Language

Add code
Oct 14, 2025
Figure 1 for CTRL-Rec: Controlling Recommender Systems With Natural Language
Figure 2 for CTRL-Rec: Controlling Recommender Systems With Natural Language
Figure 3 for CTRL-Rec: Controlling Recommender Systems With Natural Language
Figure 4 for CTRL-Rec: Controlling Recommender Systems With Natural Language
Viaarxiv icon

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Add code
Jul 15, 2025
Figure 1 for Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Viaarxiv icon

Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL

Add code
May 23, 2025
Viaarxiv icon

AssistanceZero: Scalably Solving Assistance Games

Add code
Apr 09, 2025
Viaarxiv icon

An Approach to Technical AGI Safety and Security

Add code
Apr 02, 2025
Viaarxiv icon

Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following Temporal Representation Alignment

Add code
Feb 08, 2025
Figure 1 for Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following Temporal Representation Alignment
Figure 2 for Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following Temporal Representation Alignment
Figure 3 for Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following Temporal Representation Alignment
Figure 4 for Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following Temporal Representation Alignment
Viaarxiv icon

Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning

Add code
Nov 07, 2024
Figure 1 for Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
Figure 2 for Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
Figure 3 for Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
Figure 4 for Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
Viaarxiv icon

Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations

Add code
Nov 07, 2024
Figure 1 for Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations
Figure 2 for Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations
Figure 3 for Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations
Figure 4 for Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations
Viaarxiv icon

Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback

Add code
Nov 04, 2024
Viaarxiv icon