Picture for Xiaojun Chang

Xiaojun Chang

A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model

Add code
Apr 07, 2026
Viaarxiv icon

LatentPilot: Scene-Aware Vision-and-Language Navigation by Dreaming Ahead with Latent Visual Reasoning

Add code
Mar 31, 2026
Viaarxiv icon

FACE-net: Factual Calibration and Emotion Augmentation for Retrieval-enhanced Emotional Video Captioning

Add code
Mar 18, 2026
Viaarxiv icon

Beyond Dense Futures: World Models as Structured Planners for Robotic Manipulation

Add code
Mar 13, 2026
Viaarxiv icon

GeoSense: Internalizing Geometric Necessity Perception for Multimodal Reasoning

Add code
Mar 11, 2026
Viaarxiv icon

Implicit Geometry Representations for Vision-and-Language Navigation from Web Videos

Add code
Mar 10, 2026
Viaarxiv icon

See, Plan, Rewind: Progress-Aware Vision-Language-Action Models for Robust Robotic Manipulation

Add code
Mar 10, 2026
Viaarxiv icon

Suppressing Prior-Comparison Hallucinations in Radiology Report Generation via Semantically Decoupled Latent Steering

Add code
Feb 27, 2026
Viaarxiv icon

Order from Chaos: Physical World Understanding from Glitchy Gameplay Videos

Add code
Jan 23, 2026
Viaarxiv icon

Measuring Social Bias in Vision-Language Models with Face-Only Counterfactuals from Real Photos

Add code
Jan 11, 2026
Viaarxiv icon