Picture for Wenxuan Huang

Wenxuan Huang

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Add code
Jun 12, 2025
Viaarxiv icon

MT$^{3}$: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning

Add code
May 26, 2025
Viaarxiv icon

CompBench: Benchmarking Complex Instruction-guided Image Editing

Add code
May 18, 2025
Viaarxiv icon

Large Language Model Enhancers for Graph Neural Networks: An Analysis from the Perspective of Causal Mechanism Identification

Add code
May 15, 2025
Viaarxiv icon

LLM Enhancers for GNNs: An Analysis from the Perspective of Causal Mechanism Identification

Add code
May 13, 2025
Viaarxiv icon

ReactDance: Progressive-Granular Representation for Long-Term Coherent Reactive Dance Generation

Add code
May 08, 2025
Viaarxiv icon

TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation

Add code
Apr 24, 2025
Viaarxiv icon

VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning

Add code
Apr 10, 2025
Viaarxiv icon

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

Add code
Mar 11, 2025
Viaarxiv icon

LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition?

Add code
Mar 10, 2025
Viaarxiv icon