Picture for Mohammad Mahdi Salmani-Zarchi

Mohammad Mahdi Salmani-Zarchi

MDP-GRPO: Stabilized Group Relative Policy Optimization for Multi-Constraint Instruction Following

Add code
Jun 04, 2026
Viaarxiv icon