Picture for Meng Cao

Meng Cao

ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding

Add code
May 29, 2025
Viaarxiv icon

Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation

Add code
May 29, 2025
Viaarxiv icon

Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs

Add code
May 28, 2025
Viaarxiv icon

Ground-R1: Incentivizing Grounded Visual Reasoning via Reinforcement Learning

Add code
May 26, 2025
Viaarxiv icon

SCAR: Shapley Credit Assignment for More Efficient RLHF

Add code
May 26, 2025
Viaarxiv icon

Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding?

Add code
May 20, 2025
Viaarxiv icon

MR. Judge: Multimodal Reasoner as a Judge

Add code
May 19, 2025
Viaarxiv icon

StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

Add code
May 08, 2025
Viaarxiv icon

BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese

Add code
May 01, 2025
Viaarxiv icon

IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs

Add code
Apr 21, 2025
Viaarxiv icon