Q Learning


BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL

Add code
May 29, 2025
Viaarxiv icon

On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment

Add code
May 29, 2025
Viaarxiv icon

Discriminative Policy Optimization for Token-Level Reward Models

Add code
May 29, 2025
Viaarxiv icon

Learning to Charge More: A Theoretical Study of Collusion by Q-Learning Agents

Add code
May 28, 2025
Viaarxiv icon

Normalizing Flows are Capable Models for RL

Add code
May 29, 2025
Viaarxiv icon

Oryx: a Performant and Scalable Algorithm for Many-Agent Coordination in Offline MARL

Add code
May 28, 2025
Viaarxiv icon

Scaling Offline RL via Efficient and Expressive Shortcut Models

Add code
May 28, 2025
Viaarxiv icon

Enhanced DACER Algorithm with High Diffusion Efficiency

Add code
May 29, 2025
Viaarxiv icon

A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging

Add code
May 27, 2025
Viaarxiv icon

Reinforcement Learning-based Sequential Route Recommendation for System-Optimal Traffic Assignment

Add code
May 27, 2025
Viaarxiv icon