Picture for Viacheslav Sinii

Viacheslav Sinii

F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare

Add code
Feb 06, 2026
Viaarxiv icon

Steering LLM Reasoning Through Bias-Only Adaptation

Add code
May 24, 2025
Figure 1 for Steering LLM Reasoning Through Bias-Only Adaptation
Figure 2 for Steering LLM Reasoning Through Bias-Only Adaptation
Figure 3 for Steering LLM Reasoning Through Bias-Only Adaptation
Viaarxiv icon

You Do Not Fully Utilize Transformer's Representation Capacity

Add code
Feb 13, 2025
Figure 1 for You Do Not Fully Utilize Transformer's Representation Capacity
Figure 2 for You Do Not Fully Utilize Transformer's Representation Capacity
Figure 3 for You Do Not Fully Utilize Transformer's Representation Capacity
Figure 4 for You Do Not Fully Utilize Transformer's Representation Capacity
Viaarxiv icon

The Differences Between Direct Alignment Algorithms are a Blur

Add code
Feb 03, 2025
Figure 1 for The Differences Between Direct Alignment Algorithms are a Blur
Figure 2 for The Differences Between Direct Alignment Algorithms are a Blur
Figure 3 for The Differences Between Direct Alignment Algorithms are a Blur
Figure 4 for The Differences Between Direct Alignment Algorithms are a Blur
Viaarxiv icon

XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning

Add code
Jun 13, 2024
Figure 1 for XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
Figure 2 for XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
Figure 3 for XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
Figure 4 for XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
Viaarxiv icon

In-Context Reinforcement Learning for Variable Action Spaces

Add code
Dec 20, 2023
Figure 1 for In-Context Reinforcement Learning for Variable Action Spaces
Figure 2 for In-Context Reinforcement Learning for Variable Action Spaces
Viaarxiv icon

Emergence of In-Context Reinforcement Learning from Noise Distillation

Add code
Dec 19, 2023
Figure 1 for Emergence of In-Context Reinforcement Learning from Noise Distillation
Viaarxiv icon

XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX

Add code
Dec 19, 2023
Viaarxiv icon