Picture for Wu Liu

Wu Liu

A Paradigm Shift: Fully End-to-End Training for Temporal Sentence Grounding in Videos

Add code
Apr 03, 2026
Viaarxiv icon

EMS: Multi-Agent Voting via Efficient Majority-then-Stopping

Add code
Apr 03, 2026
Viaarxiv icon

Rel-Zero: Harnessing Patch-Pair Invariance for Robust Zero-Watermarking Against AI Editing

Add code
Mar 18, 2026
Viaarxiv icon

GUI-Eyes: Tool-Augmented Perception for Visual Grounding in GUI Agents

Add code
Jan 14, 2026
Viaarxiv icon

Region-Constraint In-Context Generation for Instructional Video Editing

Add code
Dec 19, 2025
Figure 1 for Region-Constraint In-Context Generation for Instructional Video Editing
Figure 2 for Region-Constraint In-Context Generation for Instructional Video Editing
Figure 3 for Region-Constraint In-Context Generation for Instructional Video Editing
Figure 4 for Region-Constraint In-Context Generation for Instructional Video Editing
Viaarxiv icon

MotionPro: A Precise Motion Controller for Image-to-Video Generation

Add code
May 26, 2025
Figure 1 for MotionPro: A Precise Motion Controller for Image-to-Video Generation
Figure 2 for MotionPro: A Precise Motion Controller for Image-to-Video Generation
Figure 3 for MotionPro: A Precise Motion Controller for Image-to-Video Generation
Figure 4 for MotionPro: A Precise Motion Controller for Image-to-Video Generation
Viaarxiv icon

HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation

Add code
Mar 31, 2025
Viaarxiv icon

OmniPrism: Learning Disentangled Visual Concept for Image Generation

Add code
Dec 16, 2024
Figure 1 for OmniPrism: Learning Disentangled Visual Concept for Image Generation
Figure 2 for OmniPrism: Learning Disentangled Visual Concept for Image Generation
Figure 3 for OmniPrism: Learning Disentangled Visual Concept for Image Generation
Figure 4 for OmniPrism: Learning Disentangled Visual Concept for Image Generation
Viaarxiv icon

LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation

Add code
Dec 13, 2024
Figure 1 for LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation
Figure 2 for LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation
Figure 3 for LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation
Figure 4 for LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation
Viaarxiv icon

T-SVG: Text-Driven Stereoscopic Video Generation

Add code
Dec 12, 2024
Figure 1 for T-SVG: Text-Driven Stereoscopic Video Generation
Figure 2 for T-SVG: Text-Driven Stereoscopic Video Generation
Figure 3 for T-SVG: Text-Driven Stereoscopic Video Generation
Figure 4 for T-SVG: Text-Driven Stereoscopic Video Generation
Viaarxiv icon