Picture for Cong Wang

Cong Wang

Zhejiang University, Hangzhou, China

GTA: Advancing Image-to-3D World Generation via Geometry Then Appearance Video Diffusion

Add code
May 13, 2026
Viaarxiv icon

What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis

Add code
May 05, 2026
Viaarxiv icon

Embody4D: A Generalist 4D World Model for Embodied AI

Add code
May 03, 2026
Viaarxiv icon

Denoise and Align: Diffusion-Driven Foreground Knowledge Prompting for Open-Vocabulary Temporal Action Detection

Add code
Apr 20, 2026
Viaarxiv icon

HazardArena: Evaluating Semantic Safety in Vision-Language-Action Models

Add code
Apr 14, 2026
Viaarxiv icon

VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis

Add code
Apr 08, 2026
Viaarxiv icon

See the Forest for the Trees: Loosely Speculative Decoding via Visual-Semantic Guidance for Efficient Inference of Video LLMs

Add code
Apr 07, 2026
Viaarxiv icon

ParallelVLM: Lossless Video-LLM Acceleration with Visual Alignment Aware Parallel Speculative Decoding

Add code
Mar 23, 2026
Viaarxiv icon

Training-Free Sparse Attention for Fast Video Generation via Offline Layer-Wise Sparsity Profiling and Online Bidirectional Co-Clustering

Add code
Mar 19, 2026
Viaarxiv icon

PhysVideo: Physically Plausible Video Generation with Cross-View Geometry Guidance

Add code
Mar 19, 2026
Viaarxiv icon