Picture for Yue Wang

Yue Wang

Zhongguancun Academy

Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start

Add code
May 28, 2025
Viaarxiv icon

Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Add code
May 28, 2025
Viaarxiv icon

Pessimism Principle Can Be Effective: Towards a Framework for Zero-Shot Transfer Reinforcement Learning

Add code
May 24, 2025
Viaarxiv icon

U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding

Add code
May 23, 2025
Viaarxiv icon

Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training

Add code
May 20, 2025
Viaarxiv icon

DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management

Add code
May 19, 2025
Figure 1 for DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management
Figure 2 for DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management
Figure 3 for DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management
Figure 4 for DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management
Viaarxiv icon

A Finite-Sample Analysis of Distributionally Robust Average-Reward Reinforcement Learning

Add code
May 18, 2025
Viaarxiv icon

MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning

Add code
May 15, 2025
Figure 1 for MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning
Figure 2 for MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning
Figure 3 for MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning
Figure 4 for MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning
Viaarxiv icon

ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation

Add code
May 14, 2025
Viaarxiv icon

HMCF: A Human-in-the-loop Multi-Robot Collaboration Framework Based on Large Language Models

Add code
May 01, 2025
Viaarxiv icon