Picture for Alexey Gorbatovski

Alexey Gorbatovski

From Pixels to Digital Agents: An Empirical Study on the Taxonomy and Technological Trends of Reinforcement Learning Environments

Add code
Mar 25, 2026
Viaarxiv icon

F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare

Add code
Feb 06, 2026
Viaarxiv icon

Steering LLM Reasoning Through Bias-Only Adaptation

Add code
May 24, 2025
Figure 1 for Steering LLM Reasoning Through Bias-Only Adaptation
Figure 2 for Steering LLM Reasoning Through Bias-Only Adaptation
Figure 3 for Steering LLM Reasoning Through Bias-Only Adaptation
Viaarxiv icon

The Differences Between Direct Alignment Algorithms are a Blur

Add code
Feb 03, 2025
Figure 1 for The Differences Between Direct Alignment Algorithms are a Blur
Figure 2 for The Differences Between Direct Alignment Algorithms are a Blur
Figure 3 for The Differences Between Direct Alignment Algorithms are a Blur
Figure 4 for The Differences Between Direct Alignment Algorithms are a Blur
Viaarxiv icon

Learn Your Reference Model for Real Good Alignment

Add code
Apr 15, 2024
Figure 1 for Learn Your Reference Model for Real Good Alignment
Figure 2 for Learn Your Reference Model for Real Good Alignment
Figure 3 for Learn Your Reference Model for Real Good Alignment
Figure 4 for Learn Your Reference Model for Real Good Alignment
Viaarxiv icon

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Add code
Feb 16, 2024
Figure 1 for Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Figure 2 for Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Figure 3 for Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Figure 4 for Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Viaarxiv icon

Reinforcement learning for question answering in programming domain using public community scoring as a human feedback

Add code
Jan 19, 2024
Figure 1 for Reinforcement learning for question answering in programming domain using public community scoring as a human feedback
Figure 2 for Reinforcement learning for question answering in programming domain using public community scoring as a human feedback
Figure 3 for Reinforcement learning for question answering in programming domain using public community scoring as a human feedback
Figure 4 for Reinforcement learning for question answering in programming domain using public community scoring as a human feedback
Viaarxiv icon

Bayesian Networks for Named Entity Prediction in Programming Community Question Answering

Add code
Feb 26, 2023
Viaarxiv icon