Picture for Penghao Wu

Penghao Wu

From Pixels to Words -- Towards Native One-Vision Models at Scale

Add code
May 27, 2026
Viaarxiv icon

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

Add code
May 12, 2026
Viaarxiv icon

Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning

Add code
Jun 16, 2025
Viaarxiv icon

GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

Add code
Jun 09, 2025
Figure 1 for GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior
Figure 2 for GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior
Figure 3 for GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior
Figure 4 for GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior
Viaarxiv icon

Streamline Without Sacrifice -- Squeeze out Computation Redundancy in LMM

Add code
May 21, 2025
Viaarxiv icon

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Add code
Jan 23, 2025
Figure 1 for Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos
Figure 2 for Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos
Figure 3 for Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos
Figure 4 for Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos
Viaarxiv icon

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Add code
Jun 24, 2024
Figure 1 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 2 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 3 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 4 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Viaarxiv icon

Generalized Predictive Model for Autonomous Driving

Add code
Mar 14, 2024
Figure 1 for Generalized Predictive Model for Autonomous Driving
Figure 2 for Generalized Predictive Model for Autonomous Driving
Figure 3 for Generalized Predictive Model for Autonomous Driving
Figure 4 for Generalized Predictive Model for Autonomous Driving
Viaarxiv icon

V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs

Add code
Dec 26, 2023
Figure 1 for V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Figure 2 for V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Figure 3 for V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Figure 4 for V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Viaarxiv icon

End-to-end Autonomous Driving: Challenges and Frontiers

Add code
Jun 29, 2023
Figure 1 for End-to-end Autonomous Driving: Challenges and Frontiers
Figure 2 for End-to-end Autonomous Driving: Challenges and Frontiers
Figure 3 for End-to-end Autonomous Driving: Challenges and Frontiers
Figure 4 for End-to-end Autonomous Driving: Challenges and Frontiers
Viaarxiv icon