Picture for Zhibo Yang

Zhibo Yang

BabyVision: Visual Reasoning Beyond Language

Add code
Jan 10, 2026
Viaarxiv icon

Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

Add code
Jan 08, 2026
Viaarxiv icon

DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding

Add code
Aug 12, 2025
Viaarxiv icon

MAER-Nav: Bidirectional Motion Learning Through Mirror-Augmented Experience Replay for Robot Navigation

Add code
Mar 31, 2025
Figure 1 for MAER-Nav: Bidirectional Motion Learning Through Mirror-Augmented Experience Replay for Robot Navigation
Figure 2 for MAER-Nav: Bidirectional Motion Learning Through Mirror-Augmented Experience Replay for Robot Navigation
Figure 3 for MAER-Nav: Bidirectional Motion Learning Through Mirror-Augmented Experience Replay for Robot Navigation
Figure 4 for MAER-Nav: Bidirectional Motion Learning Through Mirror-Augmented Experience Replay for Robot Navigation
Viaarxiv icon

Generative Compositor for Few-Shot Visual Information Extraction

Add code
Mar 21, 2025
Viaarxiv icon

Enhancing Deep Reinforcement Learning-based Robot Navigation Generalization through Scenario Augmentation

Add code
Mar 03, 2025
Figure 1 for Enhancing Deep Reinforcement Learning-based Robot Navigation Generalization through Scenario Augmentation
Figure 2 for Enhancing Deep Reinforcement Learning-based Robot Navigation Generalization through Scenario Augmentation
Figure 3 for Enhancing Deep Reinforcement Learning-based Robot Navigation Generalization through Scenario Augmentation
Figure 4 for Enhancing Deep Reinforcement Learning-based Robot Navigation Generalization through Scenario Augmentation
Viaarxiv icon

Beyond Visibility Limits: A DRL-Based Navigation Strategy for Unexpected Obstacles

Add code
Mar 03, 2025
Figure 1 for Beyond Visibility Limits: A DRL-Based Navigation Strategy for Unexpected Obstacles
Figure 2 for Beyond Visibility Limits: A DRL-Based Navigation Strategy for Unexpected Obstacles
Figure 3 for Beyond Visibility Limits: A DRL-Based Navigation Strategy for Unexpected Obstacles
Figure 4 for Beyond Visibility Limits: A DRL-Based Navigation Strategy for Unexpected Obstacles
Viaarxiv icon

OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models

Add code
Feb 22, 2025
Viaarxiv icon

Qwen2.5-VL Technical Report

Add code
Feb 19, 2025
Figure 1 for Qwen2.5-VL Technical Report
Figure 2 for Qwen2.5-VL Technical Report
Figure 3 for Qwen2.5-VL Technical Report
Figure 4 for Qwen2.5-VL Technical Report
Viaarxiv icon

SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild

Add code
Jan 07, 2025
Figure 1 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Figure 2 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Figure 3 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Figure 4 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Viaarxiv icon