Picture for Lu Wang

Lu Wang

CSSE, Shenzhen University

Evaluation Framework for AI Systems in "the Wild"

Add code
Apr 23, 2025
Viaarxiv icon

Process Reward Models That Think

Add code
Apr 23, 2025
Viaarxiv icon

UFO2: The Desktop AgentOS

Add code
Apr 20, 2025
Viaarxiv icon

CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives

Add code
Apr 15, 2025
Viaarxiv icon

MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?

Add code
Apr 13, 2025
Viaarxiv icon

Detection-Friendly Nonuniformity Correction: A Union Framework for Infrared UAVTarget Detection

Add code
Apr 05, 2025
Viaarxiv icon

Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?

Add code
Feb 26, 2025
Viaarxiv icon

VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

Add code
Feb 26, 2025
Viaarxiv icon

Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance

Add code
Feb 24, 2025
Viaarxiv icon

Unstructured Evidence Attribution for Long Context Query Focused Summarization

Add code
Feb 20, 2025
Viaarxiv icon