Picture for Qi Dai

Qi Dai

Microsoft Research Asia

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

Add code
Apr 16, 2026
Viaarxiv icon

AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation

Add code
Apr 09, 2026
Viaarxiv icon

DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data

Add code
Apr 02, 2026
Viaarxiv icon

BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation

Add code
Mar 26, 2026
Viaarxiv icon

FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance

Add code
Mar 12, 2026
Viaarxiv icon

High-Fidelity Text-to-Image Generation from Pre-Trained Vision-Language Models via Distribution-Conditioned Diffusion Decoding

Add code
Mar 11, 2026
Viaarxiv icon

Towards On-Policy SFT: Distribution Discriminant Theory and its Applications in LLM Training

Add code
Feb 12, 2026
Viaarxiv icon

ArcFlow: Unleashing 2-Step Text-to-Image Generation via High-Precision Non-Linear Flow Distillation

Add code
Feb 09, 2026
Viaarxiv icon

RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents

Add code
Feb 02, 2026
Viaarxiv icon

SimRPD: Optimizing Recruitment Proactive Dialogue Agents through Simulator-Based Data Evaluation and Selection

Add code
Jan 08, 2026
Viaarxiv icon