Picture for Yulan Hu

Yulan Hu

Learn More with Less: Uncertainty Consistency Guided Query Selection for RLVR

Add code
Jan 30, 2026
Viaarxiv icon

No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning

Add code
Jan 11, 2026
Viaarxiv icon

TravelBench: A Broader Real-World Benchmark for Multi-Turn and Tool-Using Travel Planning

Add code
Jan 05, 2026
Viaarxiv icon

AMAP Agentic Planning Technical Report

Add code
Dec 31, 2025
Viaarxiv icon

TravelBench: A Real-World Benchmark for Multi-Turn and Tool-Augmented Travel Planning

Add code
Dec 27, 2025
Viaarxiv icon

Towards Reward Fairness in RLHF: From a Resource Allocation Perspective

Add code
May 29, 2025
Viaarxiv icon

SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin

Add code
Feb 19, 2025
Figure 1 for SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin
Figure 2 for SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin
Figure 3 for SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin
Figure 4 for SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin
Viaarxiv icon

Coarse-to-Fine Process Reward Modeling for Enhanced Mathematical Reasoning

Add code
Jan 23, 2025
Figure 1 for Coarse-to-Fine Process Reward Modeling for Enhanced Mathematical Reasoning
Figure 2 for Coarse-to-Fine Process Reward Modeling for Enhanced Mathematical Reasoning
Figure 3 for Coarse-to-Fine Process Reward Modeling for Enhanced Mathematical Reasoning
Figure 4 for Coarse-to-Fine Process Reward Modeling for Enhanced Mathematical Reasoning
Viaarxiv icon

Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models

Add code
Nov 25, 2024
Figure 1 for Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models
Figure 2 for Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models
Figure 3 for Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models
Figure 4 for Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models
Viaarxiv icon

GUNDAM: Aligning Large Language Models with Graph Understanding

Add code
Sep 30, 2024
Figure 1 for GUNDAM: Aligning Large Language Models with Graph Understanding
Figure 2 for GUNDAM: Aligning Large Language Models with Graph Understanding
Figure 3 for GUNDAM: Aligning Large Language Models with Graph Understanding
Figure 4 for GUNDAM: Aligning Large Language Models with Graph Understanding
Viaarxiv icon