Picture for Yue Wang

Yue Wang

Zhongguancun Academy

Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Add code
May 28, 2025
Viaarxiv icon

Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start

Add code
May 28, 2025
Viaarxiv icon

Pessimism Principle Can Be Effective: Towards a Framework for Zero-Shot Transfer Reinforcement Learning

Add code
May 24, 2025
Viaarxiv icon

U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding

Add code
May 23, 2025
Viaarxiv icon

Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training

Add code
May 20, 2025
Viaarxiv icon

DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management

Add code
May 19, 2025
Figure 1 for DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management
Figure 2 for DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management
Figure 3 for DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management
Figure 4 for DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management
Viaarxiv icon

A Finite-Sample Analysis of Distributionally Robust Average-Reward Reinforcement Learning

Add code
May 18, 2025
Viaarxiv icon

MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning

Add code
May 15, 2025
Figure 1 for MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning
Figure 2 for MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning
Figure 3 for MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning
Figure 4 for MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning
Viaarxiv icon

ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation

Add code
May 14, 2025
Viaarxiv icon

HMCF: A Human-in-the-loop Multi-Robot Collaboration Framework Based on Large Language Models

Add code
May 01, 2025
Viaarxiv icon