Picture for Xiaojun Chang

Xiaojun Chang

FACE-net: Factual Calibration and Emotion Augmentation for Retrieval-enhanced Emotional Video Captioning

Add code
Mar 18, 2026
Viaarxiv icon

Beyond Dense Futures: World Models as Structured Planners for Robotic Manipulation

Add code
Mar 13, 2026
Viaarxiv icon

GeoSense: Internalizing Geometric Necessity Perception for Multimodal Reasoning

Add code
Mar 11, 2026
Viaarxiv icon

See, Plan, Rewind: Progress-Aware Vision-Language-Action Models for Robust Robotic Manipulation

Add code
Mar 10, 2026
Viaarxiv icon

Implicit Geometry Representations for Vision-and-Language Navigation from Web Videos

Add code
Mar 10, 2026
Viaarxiv icon

Suppressing Prior-Comparison Hallucinations in Radiology Report Generation via Semantically Decoupled Latent Steering

Add code
Feb 27, 2026
Viaarxiv icon

Order from Chaos: Physical World Understanding from Glitchy Gameplay Videos

Add code
Jan 23, 2026
Viaarxiv icon

Measuring Social Bias in Vision-Language Models with Face-Only Counterfactuals from Real Photos

Add code
Jan 11, 2026
Viaarxiv icon

Parallel Diffusion Solver via Residual Dirichlet Policy Optimization

Add code
Dec 28, 2025
Viaarxiv icon

CARE What Fails: Contrastive Anchored-REflection for Verifiable Multimodal

Add code
Dec 22, 2025
Viaarxiv icon