Picture for Idan Shenfeld

Idan Shenfeld

RL's Razor: Why Online Reinforcement Learning Forgets Less

Add code
Sep 04, 2025
Viaarxiv icon

Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty

Add code
Jul 22, 2025
Viaarxiv icon

LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users

Add code
Jul 03, 2025
Viaarxiv icon

Language Model Personalization via Reward Factorization

Add code
Mar 08, 2025
Viaarxiv icon

Theoretical Analysis of KL-regularized RLHF with Multiple Reference Models

Add code
Feb 03, 2025
Figure 1 for Theoretical Analysis of KL-regularized RLHF with Multiple Reference Models
Viaarxiv icon

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

Add code
Oct 07, 2024
Viaarxiv icon

From Imitation to Refinement -- Residual RL for Precise Visual Assembly

Add code
Jul 23, 2024
Viaarxiv icon

Value Augmented Sampling for Language Model Alignment and Personalization

Add code
May 10, 2024
Figure 1 for Value Augmented Sampling for Language Model Alignment and Personalization
Figure 2 for Value Augmented Sampling for Language Model Alignment and Personalization
Figure 3 for Value Augmented Sampling for Language Model Alignment and Personalization
Figure 4 for Value Augmented Sampling for Language Model Alignment and Personalization
Viaarxiv icon

JUICER: Data-Efficient Imitation Learning for Robotic Assembly

Add code
Apr 09, 2024
Figure 1 for JUICER: Data-Efficient Imitation Learning for Robotic Assembly
Figure 2 for JUICER: Data-Efficient Imitation Learning for Robotic Assembly
Figure 3 for JUICER: Data-Efficient Imitation Learning for Robotic Assembly
Figure 4 for JUICER: Data-Efficient Imitation Learning for Robotic Assembly
Viaarxiv icon

Curiosity-driven Red-teaming for Large Language Models

Add code
Feb 29, 2024
Figure 1 for Curiosity-driven Red-teaming for Large Language Models
Figure 2 for Curiosity-driven Red-teaming for Large Language Models
Figure 3 for Curiosity-driven Red-teaming for Large Language Models
Figure 4 for Curiosity-driven Red-teaming for Large Language Models
Viaarxiv icon