Picture for Jingbo Shang

Jingbo Shang

OpenDeepThink: Parallel Reasoning via Bradley--Terry Aggregation

Add code
May 14, 2026
Viaarxiv icon

FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale

Add code
May 14, 2026
Viaarxiv icon

F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking

Add code
May 13, 2026
Viaarxiv icon

BOOKMARKS: Efficient Active Storyline Memory for Role-playing

Add code
May 13, 2026
Viaarxiv icon

ChipMATE: Multi-Agent Training via Reinforcement Learning for Enhanced RTL Generation

Add code
May 13, 2026
Viaarxiv icon

Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation

Add code
May 12, 2026
Viaarxiv icon

MASS-DPO: Multi-negative Active Sample Selection for Direct Policy Optimization

Add code
May 11, 2026
Viaarxiv icon

Skill-R1: Agent Skill Evolution via Reinforcement Learning

Add code
May 10, 2026
Viaarxiv icon

CocoaBench: Evaluating Unified Digital Agents in the Wild

Add code
Apr 14, 2026
Viaarxiv icon

Simulating Organized Group Behavior: New Framework, Benchmark, and Analysis

Add code
Apr 10, 2026
Viaarxiv icon