Picture for Qibin Hou

Qibin Hou

StableVLA: Towards Robust Vision-Language-Action Models without Extra Data

Add code
May 18, 2026
Viaarxiv icon

Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation

Add code
Apr 28, 2026
Viaarxiv icon

TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation

Add code
Apr 21, 2026
Viaarxiv icon

Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought

Add code
Mar 24, 2026
Viaarxiv icon

Mixture of Style Experts for Diverse Image Stylization

Add code
Mar 17, 2026
Viaarxiv icon

Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions

Add code
Feb 13, 2026
Viaarxiv icon

GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics

Add code
Feb 13, 2026
Viaarxiv icon

InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery

Add code
Feb 09, 2026
Viaarxiv icon

Trust but Verify: Adaptive Conditioning for Reference-Based Diffusion Super-Resolution via Implicit Reference Correlation Modeling

Add code
Feb 02, 2026
Viaarxiv icon

OmniSegmentor: A Flexible Multi-Modal Learning Framework for Semantic Segmentation

Add code
Sep 18, 2025
Viaarxiv icon