Picture for Wojciech Zaremba

Wojciech Zaremba

INRIA Saclay - Ile de France, CVN

Stress Testing Deliberative Alignment for Anti-Scheming Training

Add code
Sep 19, 2025
Figure 1 for Stress Testing Deliberative Alignment for Anti-Scheming Training
Figure 2 for Stress Testing Deliberative Alignment for Anti-Scheming Training
Figure 3 for Stress Testing Deliberative Alignment for Anti-Scheming Training
Figure 4 for Stress Testing Deliberative Alignment for Anti-Scheming Training
Viaarxiv icon

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Add code
Jul 15, 2025
Figure 1 for Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Viaarxiv icon

Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation

Add code
Mar 14, 2025
Viaarxiv icon

Trading Inference-Time Compute for Adversarial Robustness

Add code
Jan 31, 2025
Figure 1 for Trading Inference-Time Compute for Adversarial Robustness
Figure 2 for Trading Inference-Time Compute for Adversarial Robustness
Figure 3 for Trading Inference-Time Compute for Adversarial Robustness
Figure 4 for Trading Inference-Time Compute for Adversarial Robustness
Viaarxiv icon

OpenAI o1 System Card

Add code
Dec 21, 2024
Figure 1 for OpenAI o1 System Card
Figure 2 for OpenAI o1 System Card
Figure 3 for OpenAI o1 System Card
Figure 4 for OpenAI o1 System Card
Viaarxiv icon

GPT-4o System Card

Add code
Oct 25, 2024
Viaarxiv icon

Evaluating Large Language Models Trained on Code

Add code
Jul 14, 2021
Figure 1 for Evaluating Large Language Models Trained on Code
Figure 2 for Evaluating Large Language Models Trained on Code
Figure 3 for Evaluating Large Language Models Trained on Code
Figure 4 for Evaluating Large Language Models Trained on Code
Viaarxiv icon

A Generalizable Approach to Learning Optimizers

Add code
Jun 07, 2021
Figure 1 for A Generalizable Approach to Learning Optimizers
Figure 2 for A Generalizable Approach to Learning Optimizers
Figure 3 for A Generalizable Approach to Learning Optimizers
Figure 4 for A Generalizable Approach to Learning Optimizers
Viaarxiv icon

Asymmetric self-play for automatic goal discovery in robotic manipulation

Add code
Jan 13, 2021
Figure 1 for Asymmetric self-play for automatic goal discovery in robotic manipulation
Figure 2 for Asymmetric self-play for automatic goal discovery in robotic manipulation
Figure 3 for Asymmetric self-play for automatic goal discovery in robotic manipulation
Figure 4 for Asymmetric self-play for automatic goal discovery in robotic manipulation
Viaarxiv icon

Predicting Sim-to-Real Transfer with Probabilistic Dynamics Models

Add code
Sep 27, 2020
Figure 1 for Predicting Sim-to-Real Transfer with Probabilistic Dynamics Models
Figure 2 for Predicting Sim-to-Real Transfer with Probabilistic Dynamics Models
Figure 3 for Predicting Sim-to-Real Transfer with Probabilistic Dynamics Models
Figure 4 for Predicting Sim-to-Real Transfer with Probabilistic Dynamics Models
Viaarxiv icon