Picture for Deqing Wang

Deqing Wang

Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards

Add code
Feb 09, 2026
Viaarxiv icon

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Add code
Feb 09, 2026
Viaarxiv icon

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

Add code
Feb 09, 2026
Viaarxiv icon

Real-Time Aligned Reward Model beyond Semantics

Add code
Jan 30, 2026
Viaarxiv icon

Your Group-Relative Advantage Is Biased

Add code
Jan 13, 2026
Viaarxiv icon

LLMBoost: Make Large Language Models Stronger with Boosting

Add code
Dec 26, 2025
Viaarxiv icon

FLeW: Facet-Level and Adaptive Weighted Representation Learning of Scientific Documents

Add code
Sep 09, 2025
Viaarxiv icon

CDC: Causal Domain Clustering for Multi-Domain Recommendation

Add code
Jul 09, 2025
Viaarxiv icon

Hyperbolic Diffusion Recommender Model

Add code
Apr 02, 2025
Viaarxiv icon

Bridging Social Psychology and LLM Reasoning: Conflict-Aware Meta-Review Generation via Cognitive Alignment

Add code
Mar 21, 2025
Viaarxiv icon