Picture for Feng Zheng

Feng Zheng

A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model

Add code
Apr 07, 2026
Viaarxiv icon

Structured Causal Video Reasoning via Multi-Objective Alignment

Add code
Apr 06, 2026
Viaarxiv icon

Scalable Object Relation Encoding for Better 3D Spatial Reasoning in Large Language Models

Add code
Mar 25, 2026
Viaarxiv icon

Learning Trajectory-Aware Multimodal Large Language Models for Video Reasoning Segmentation

Add code
Mar 23, 2026
Viaarxiv icon

Show Me When and Where: Towards Referring Video Object Segmentation in the Wild

Add code
Mar 15, 2026
Viaarxiv icon

AD-Copilot: A Vision-Language Assistant for Industrial Anomaly Detection via Visual In-context Comparison

Add code
Mar 14, 2026
Viaarxiv icon

Mastering Negation: Boosting Grounding Models via Grouped Opposition-Based Learning

Add code
Mar 13, 2026
Viaarxiv icon

RADAR: Closed-Loop Robotic Data Generation via Semantic Planning and Autonomous Causal Environment Reset

Add code
Mar 12, 2026
Viaarxiv icon

ST4VLA: Spatially Guided Training for Vision-Language-Action Models

Add code
Feb 10, 2026
Viaarxiv icon

ConsistentRFT: Reducing Visual Hallucinations in Flow-based Reinforcement Fine-Tuning

Add code
Feb 03, 2026
Viaarxiv icon