Picture for Shuo Yang

Shuo Yang

LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization

Add code
May 30, 2025
Viaarxiv icon

Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future Prospects

Add code
May 26, 2025
Viaarxiv icon

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Add code
May 24, 2025
Viaarxiv icon

L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models

Add code
May 23, 2025
Viaarxiv icon

MDVT: Enhancing Multimodal Recommendation with Model-Agnostic Multimodal-Driven Virtual Triplets

Add code
May 22, 2025
Viaarxiv icon

RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction

Add code
May 18, 2025
Viaarxiv icon

METOR: A Unified Framework for Mutual Enhancement of Objects and Relationships in Open-vocabulary Video Visual Relationship Detection

Add code
May 10, 2025
Viaarxiv icon

Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving

Add code
May 06, 2025
Viaarxiv icon

UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation

Add code
Mar 19, 2025
Viaarxiv icon

WorldModelBench: Judging Video Generation Models As World Models

Add code
Feb 28, 2025
Viaarxiv icon