Picture for Bin Xu

Bin Xu

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Add code
Jul 02, 2025
Viaarxiv icon

VerIF: Verification Engineering for Reinforcement Learning in Instruction Following

Add code
Jun 11, 2025
Viaarxiv icon

From Swath to Full-Disc: Advancing Precipitation Retrieval with Multimodal Knowledge Expansion

Add code
Jun 08, 2025
Viaarxiv icon

S2R-Bench: A Sim-to-Real Evaluation Benchmark for Autonomous Driving

Add code
May 24, 2025
Viaarxiv icon

BLAST: Balanced Sampling Time Series Corpus for Universal Forecasting Models

Add code
May 23, 2025
Viaarxiv icon

AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios

Add code
May 22, 2025
Viaarxiv icon

EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational Scenarios

Add code
May 22, 2025
Viaarxiv icon

DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models

Add code
May 13, 2025
Viaarxiv icon

LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning

Add code
May 04, 2025
Viaarxiv icon

An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes

Add code
Apr 21, 2025
Viaarxiv icon