Picture for Minghao Zhu

Minghao Zhu

Beyond Semantic Manipulation: Token-Space Attacks on Reward Models

Add code
Apr 03, 2026
Viaarxiv icon

GroundVTS: Visual Token Sampling in Multimodal Large Language Models for Video Temporal Grounding

Add code
Apr 02, 2026
Viaarxiv icon

RynnBrain: Open Embodied Foundation Models

Add code
Feb 13, 2026
Viaarxiv icon

SignRAG: A Retrieval-Augmented System for Scalable Zero-Shot Road Sign Recognition

Add code
Dec 14, 2025
Figure 1 for SignRAG: A Retrieval-Augmented System for Scalable Zero-Shot Road Sign Recognition
Figure 2 for SignRAG: A Retrieval-Augmented System for Scalable Zero-Shot Road Sign Recognition
Figure 3 for SignRAG: A Retrieval-Augmented System for Scalable Zero-Shot Road Sign Recognition
Figure 4 for SignRAG: A Retrieval-Augmented System for Scalable Zero-Shot Road Sign Recognition
Viaarxiv icon

An Uncertainty-Weighted Decision Transformer for Navigation in Dense, Complex Driving Scenarios

Add code
Sep 16, 2025
Viaarxiv icon

CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation

Add code
Feb 03, 2025
Figure 1 for CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Figure 2 for CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Figure 3 for CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Figure 4 for CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Viaarxiv icon

MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer

Add code
Oct 14, 2024
Viaarxiv icon

Efficient Text-driven Motion Generation via Latent Consistency Training

Add code
May 05, 2024
Viaarxiv icon

CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge

Add code
Feb 24, 2024
Figure 1 for CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge
Figure 2 for CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge
Figure 3 for CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge
Figure 4 for CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge
Viaarxiv icon

Training Adversarial yet Safe Agent to Characterize Safety Performance of Highly Automated Vehicles

Add code
Feb 02, 2024
Viaarxiv icon