Picture for Di Zhang

Di Zhang

TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs

Add code
Mar 13, 2025
Viaarxiv icon

Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding

Add code
Mar 12, 2025
Viaarxiv icon

ExGes: Expressive Human Motion Retrieval and Modulation for Audio-Driven Gesture Synthesis

Add code
Mar 09, 2025
Viaarxiv icon

RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification

Add code
Mar 04, 2025
Viaarxiv icon

HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models

Add code
Feb 28, 2025
Viaarxiv icon

Be a Multitude to Itself: A Prompt Evolution Framework for Red Teaming

Add code
Feb 22, 2025
Viaarxiv icon

SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin

Add code
Feb 19, 2025
Viaarxiv icon

FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems

Add code
Feb 19, 2025
Viaarxiv icon

DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs

Add code
Feb 18, 2025
Viaarxiv icon

iMOVE: Instance-Motion-Aware Video Understanding

Add code
Feb 18, 2025
Viaarxiv icon