Picture for Hongyuan Zhang

Hongyuan Zhang

VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic Memory

Add code
Mar 05, 2026
Viaarxiv icon

Crab$^{+}$: A Scalable and Unified Audio-Visual Scene Understanding Model with Explicit Cooperation

Add code
Mar 04, 2026
Viaarxiv icon

AHAP: Reconstructing Arbitrary Humans from Arbitrary Perspectives with Geometric Priors

Add code
Feb 27, 2026
Viaarxiv icon

ERNIE 5.0 Technical Report

Add code
Feb 04, 2026
Viaarxiv icon

MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning

Add code
Dec 29, 2025
Viaarxiv icon

ViewMask-1-to-3: Multi-View Consistent Image Generation via Multimodal Diffusion Models

Add code
Dec 16, 2025
Figure 1 for ViewMask-1-to-3: Multi-View Consistent Image Generation via Multimodal Diffusion Models
Figure 2 for ViewMask-1-to-3: Multi-View Consistent Image Generation via Multimodal Diffusion Models
Figure 3 for ViewMask-1-to-3: Multi-View Consistent Image Generation via Multimodal Diffusion Models
Figure 4 for ViewMask-1-to-3: Multi-View Consistent Image Generation via Multimodal Diffusion Models
Viaarxiv icon

Is Your VLM for Autonomous Driving Safety-Ready? A Comprehensive Benchmark for Evaluating External and In-Cabin Risks

Add code
Nov 19, 2025
Viaarxiv icon

GRPO-RM: Fine-Tuning Representation Models via GRPO-Driven Reinforcement Learning

Add code
Nov 19, 2025
Viaarxiv icon

Explore How to Inject Beneficial Noise in MLLMs

Add code
Nov 17, 2025
Viaarxiv icon

Rectified Noise: A Generative Model Using Positive-incentive Noise

Add code
Nov 12, 2025
Viaarxiv icon