Picture for Qi Dai

Qi Dai

StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation

Add code
Aug 11, 2025
Viaarxiv icon

Phi-Ground Tech Report: Advancing Perception in GUI Grounding

Add code
Jul 31, 2025
Viaarxiv icon

ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning

Add code
May 21, 2025
Viaarxiv icon

JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers

Add code
May 01, 2025
Viaarxiv icon

Subject-driven Video Generation via Disentangled Identity and Motion

Add code
Apr 23, 2025
Viaarxiv icon

Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions

Add code
Apr 16, 2025
Viaarxiv icon

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance

Add code
Mar 20, 2025
Viaarxiv icon

HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard

Add code
Mar 18, 2025
Viaarxiv icon

HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models

Add code
Mar 14, 2025
Viaarxiv icon

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Add code
Mar 03, 2025
Viaarxiv icon