Picture for Hao Shi

Hao Shi

VL2Spike: Spike-driven Distillation from VLMs for Low-Power Visual Perception in Embodied AI

Add code
Jun 14, 2026
Viaarxiv icon

ReasonAlloc: Hierarchical Decoding-Time KV Cache Budget Allocation for Reasoning Models

Add code
Jun 09, 2026
Viaarxiv icon

MemoryVLA++: Temporal Modeling via Memory and Imagination in Vision-Language-Action Models

Add code
Jun 08, 2026
Viaarxiv icon

Physics-Driven Semantic Scattering Structure Understanding of Aircraft Target in SAR Images

Add code
Jun 05, 2026
Viaarxiv icon

Seeing Together: Multi-Robot Cooperative Egocentric Spatial Reasoning with Multimodal Large Language Models

Add code
May 19, 2026
Viaarxiv icon

EgoEV-HandPose: Egocentric 3D Hand Pose Estimation and Gesture Recognition with Stereo Event Cameras

Add code
May 12, 2026
Viaarxiv icon

E-VLA: Event-Augmented Vision-Language-Action Model for Dark and Blurred Scenes

Add code
Apr 06, 2026
Viaarxiv icon

Speech-Worthy Alignment for Japanese SpeechLLMs via Direct Preference Optimization

Add code
Mar 13, 2026
Viaarxiv icon

O3N: Omnidirectional Open-Vocabulary Occupancy Prediction

Add code
Mar 12, 2026
Viaarxiv icon

Streaming Translation and Transcription Through Speech-to-Text Causal Alignment

Add code
Mar 12, 2026
Viaarxiv icon