Target Policy Smoothing


LeanPO: Lean Preference Optimization for Likelihood Alignment in Video-LLMs

Add code
Jun 05, 2025
Viaarxiv icon

Fundamental Limits of Game-Theoretic LLM Alignment: Smith Consistency and Preference Matching

Add code
May 27, 2025
Viaarxiv icon

Differential Information: An Information-Theoretic Perspective on Preference Optimization

Add code
May 29, 2025
Viaarxiv icon

End-to-End Multi-Task Policy Learning from NMPC for Quadruped Locomotion

Add code
May 13, 2025
Viaarxiv icon

Trust-Region Twisted Policy Improvement

Add code
Apr 08, 2025
Viaarxiv icon

TACO: General Acrobatic Flight Control via Target-and-Command-Oriented Reinforcement Learning

Add code
Mar 06, 2025
Viaarxiv icon

Cooperative Bearing-Only Target Pursuit via Multiagent Reinforcement Learning: Design and Experiment

Add code
Mar 11, 2025
Viaarxiv icon

SATA: Safe and Adaptive Torque-Based Locomotion Policies Inspired by Animal Learning

Add code
Feb 18, 2025
Viaarxiv icon

Implicit Physics-aware Policy for Dynamic Manipulation of Rigid Objects via Soft Body Tools

Add code
Feb 08, 2025
Viaarxiv icon

Towards Learning Scalable Agile Dynamic Motion Planning for Robosoccer Teams with Policy Optimization

Add code
Feb 08, 2025
Viaarxiv icon