Picture for Yi-Chen Li

Yi-Chen Li

RMGAP: Benchmarking the Generalization of Reward Models across Diverse Preferences

Add code
May 03, 2026
Viaarxiv icon

Off-Policy Value-Based Reinforcement Learning for Large Language Models

Add code
Mar 24, 2026
Viaarxiv icon

Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints

Add code
Mar 24, 2026
Viaarxiv icon

Multi-agent In-context Coordination via Decentralized Memory Retrieval

Add code
Nov 13, 2025
Figure 1 for Multi-agent In-context Coordination via Decentralized Memory Retrieval
Figure 2 for Multi-agent In-context Coordination via Decentralized Memory Retrieval
Figure 3 for Multi-agent In-context Coordination via Decentralized Memory Retrieval
Figure 4 for Multi-agent In-context Coordination via Decentralized Memory Retrieval
Viaarxiv icon

Controlling Large Language Model with Latent Actions

Add code
Mar 27, 2025
Viaarxiv icon

Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models

Add code
Jul 04, 2024
Figure 1 for Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models
Figure 2 for Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models
Figure 3 for Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models
Figure 4 for Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models
Viaarxiv icon

Q-Adapter: Training Your LLM Adapter as a Residual Q-Function

Add code
Jul 04, 2024
Viaarxiv icon

BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation

Add code
May 27, 2024
Figure 1 for BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation
Figure 2 for BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation
Figure 3 for BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation
Figure 4 for BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation
Viaarxiv icon

Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning

Add code
May 27, 2024
Figure 1 for Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning
Figure 2 for Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning
Figure 3 for Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning
Figure 4 for Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning
Viaarxiv icon

Disentangling Policy from Offline Task Representation Learning via Adversarial Data Augmentation

Add code
Mar 12, 2024
Figure 1 for Disentangling Policy from Offline Task Representation Learning via Adversarial Data Augmentation
Figure 2 for Disentangling Policy from Offline Task Representation Learning via Adversarial Data Augmentation
Figure 3 for Disentangling Policy from Offline Task Representation Learning via Adversarial Data Augmentation
Figure 4 for Disentangling Policy from Offline Task Representation Learning via Adversarial Data Augmentation
Viaarxiv icon