Picture for Zhenbo Luo

Zhenbo Luo

Listening with Time: Precise Temporal Awareness for Long-Form Audio Understanding

Add code
Apr 24, 2026
Viaarxiv icon

Doc-V*:Coarse-to-Fine Interactive Visual Reasoning for Multi-Page Document VQA

Add code
Apr 15, 2026
Viaarxiv icon

OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering

Add code
Apr 09, 2026
Viaarxiv icon

Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models

Add code
Mar 31, 2026
Viaarxiv icon

Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

Add code
Mar 12, 2026
Viaarxiv icon

IMTBench: A Multi-Scenario Cross-Modal Collaborative Evaluation Benchmark for In-Image Machine Translation

Add code
Mar 11, 2026
Viaarxiv icon

EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models

Add code
Feb 27, 2026
Viaarxiv icon

MSJoE: Jointly Evolving MLLM and Sampler for Efficient Long-Form Video Understanding

Add code
Feb 26, 2026
Viaarxiv icon

ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding

Add code
Feb 26, 2026
Viaarxiv icon

Visual Para-Thinker: Divide-and-Conquer Reasoning for Visual Comprehension

Add code
Feb 10, 2026
Viaarxiv icon