Picture for Zhenbo Luo

Zhenbo Luo

Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

Add code
Mar 12, 2026
Viaarxiv icon

IMTBench: A Multi-Scenario Cross-Modal Collaborative Evaluation Benchmark for In-Image Machine Translation

Add code
Mar 11, 2026
Viaarxiv icon

EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models

Add code
Feb 27, 2026
Viaarxiv icon

MSJoE: Jointly Evolving MLLM and Sampler for Efficient Long-Form Video Understanding

Add code
Feb 26, 2026
Viaarxiv icon

ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding

Add code
Feb 26, 2026
Viaarxiv icon

Visual Para-Thinker: Divide-and-Conquer Reasoning for Visual Comprehension

Add code
Feb 10, 2026
Viaarxiv icon

GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving

Add code
Feb 09, 2026
Viaarxiv icon

Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation

Add code
Feb 03, 2026
Viaarxiv icon

Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models

Add code
Feb 02, 2026
Viaarxiv icon

GAIA: A Data Flywheel System for Training GUI Test-Time Scaling Critic Models

Add code
Jan 26, 2026
Viaarxiv icon