Picture for Ruibin Yuan

Ruibin Yuan

A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

Add code
Apr 21, 2026
Viaarxiv icon

Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing

Add code
Apr 12, 2026
Viaarxiv icon

CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction

Add code
Mar 04, 2026
Viaarxiv icon

Voices of Civilizations: A Multilingual QA Benchmark for Global Music Understanding

Add code
Feb 28, 2026
Viaarxiv icon

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

Add code
Jan 14, 2026
Viaarxiv icon

AutoMV: An Automatic Multi-Agent System for Music Video Generation

Add code
Dec 13, 2025
Viaarxiv icon

Hollywood Town: Long-Video Generation via Cross-Modal Multi-Agent Orchestration

Add code
Oct 25, 2025
Figure 1 for Hollywood Town: Long-Video Generation via Cross-Modal Multi-Agent Orchestration
Figure 2 for Hollywood Town: Long-Video Generation via Cross-Modal Multi-Agent Orchestration
Figure 3 for Hollywood Town: Long-Video Generation via Cross-Modal Multi-Agent Orchestration
Figure 4 for Hollywood Town: Long-Video Generation via Cross-Modal Multi-Agent Orchestration
Viaarxiv icon

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

Add code
May 21, 2025
Figure 1 for KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
Figure 2 for KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
Figure 3 for KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
Figure 4 for KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
Viaarxiv icon

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Add code
May 19, 2025
Figure 1 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 2 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 3 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 4 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Viaarxiv icon

SongEval: A Benchmark Dataset for Song Aesthetics Evaluation

Add code
May 16, 2025
Viaarxiv icon