Picture for Zhiheng Ma

Zhiheng Ma

ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment

Add code
Mar 24, 2026
Viaarxiv icon

HumanOmni-Speaker: Identifying Who said What and When

Add code
Mar 23, 2026
Viaarxiv icon

Trajectory-Diversity-Driven Robust Vision-and-Language Navigation

Add code
Mar 16, 2026
Viaarxiv icon

Neural Implicit Action Fields: From Discrete Waypoints to Continuous Functions for Vision-Language-Action Models

Add code
Mar 02, 2026
Viaarxiv icon

ReMoT: Reinforcement Learning with Motion Contrast Triplets

Add code
Feb 28, 2026
Viaarxiv icon

ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning

Add code
Feb 11, 2026
Viaarxiv icon

P2L-CA: An Effective Parameter Tuning Framework for Rehearsal-Free Multi-Label Class-Incremental Learning

Add code
Jan 19, 2026
Viaarxiv icon

DD-Ranking: Rethinking the Evaluation of Dataset Distillation

Add code
May 19, 2025
Figure 1 for DD-Ranking: Rethinking the Evaluation of Dataset Distillation
Figure 2 for DD-Ranking: Rethinking the Evaluation of Dataset Distillation
Figure 3 for DD-Ranking: Rethinking the Evaluation of Dataset Distillation
Figure 4 for DD-Ranking: Rethinking the Evaluation of Dataset Distillation
Viaarxiv icon

CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization

Add code
May 06, 2025
Viaarxiv icon

ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing

Add code
Dec 17, 2024
Figure 1 for ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing
Figure 2 for ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing
Figure 3 for ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing
Figure 4 for ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing
Viaarxiv icon