Picture for Qi Dai

Qi Dai

Microsoft Research Asia

Adaptive Inference-Time Scaling via Early-Step Latent Verification for Image Editing

Add code
Jun 13, 2026
Viaarxiv icon

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Add code
Jun 10, 2026
Viaarxiv icon

A Comprehensive Ecosystem for Open-Domain Customized Video Generation

Add code
Jun 10, 2026
Viaarxiv icon

Decomposed On-Policy Distillation for Vision-Language Reasoning: Steering Gradients for Visual Grounding

Add code
May 30, 2026
Viaarxiv icon

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

Add code
May 25, 2026
Viaarxiv icon

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

Add code
May 22, 2026
Viaarxiv icon

PDCR: Perception-Decomposed Confidence Reward for Vision-Language Reasoning

Add code
May 13, 2026
Viaarxiv icon

Covering Human Action Space for Computer Use: Data Synthesis and Benchmark

Add code
May 12, 2026
Viaarxiv icon

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

Add code
Apr 16, 2026
Viaarxiv icon

AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation

Add code
Apr 09, 2026
Viaarxiv icon