Picture for Chengyu Bai

Chengyu Bai

VEGA: Visual Encoder Grounding Alignment for Spatially-Aware Vision-Language-Action Models

Add code
May 11, 2026
Viaarxiv icon

Grounded Forcing: Bridging Time-Independent Semantics and Proximal Dynamics in Autoregressive Video Synthesis

Add code
Apr 08, 2026
Viaarxiv icon

ConceptWeaver: Weaving Disentangled Concepts with Flow

Add code
Mar 30, 2026
Viaarxiv icon

AEGPO: Adaptive Entropy-Guided Policy Optimization for Diffusion Models

Add code
Feb 06, 2026
Viaarxiv icon

StreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and Compression

Add code
Nov 10, 2025
Figure 1 for StreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and Compression
Figure 2 for StreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and Compression
Figure 3 for StreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and Compression
Figure 4 for StreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and Compression
Viaarxiv icon

WoW: Towards a World omniscient World model Through Embodied Interaction

Add code
Sep 26, 2025
Viaarxiv icon

EmbodiedOcc++: Boosting Embodied 3D Occupancy Prediction with Plane Regularization and Uncertainty Sampler

Add code
Apr 13, 2025
Viaarxiv icon