Picture for Yesheng Liu

Yesheng Liu

SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment

Add code
Jun 01, 2026
Viaarxiv icon

AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery

Add code
Apr 28, 2026
Viaarxiv icon

ToolWeaver: Weaving Collaborative Semantics for Scalable Tool Use in Large Language Models

Add code
Jan 29, 2026
Viaarxiv icon

Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench

Add code
Oct 30, 2025
Viaarxiv icon

FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation

Add code
Jun 10, 2025
Figure 1 for FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
Figure 2 for FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
Figure 3 for FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
Figure 4 for FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
Viaarxiv icon