Picture for Guangming Wang

Guangming Wang

Towards the Vision-Sound-Language-Action Paradigm: The HEAR Framework for Sound-Centric Manipulation

Add code
Mar 17, 2026
Viaarxiv icon

OGScene3D: Incremental Open-Vocabulary 3D Gaussian Scene Graph Mapping for Scene Understanding

Add code
Mar 17, 2026
Viaarxiv icon

RegFormer++: An Efficient Large-Scale 3D LiDAR Point Registration Network with Projection-Aware 2D Transformer

Add code
Mar 15, 2026
Viaarxiv icon

ActionReasoning: Robot Action Reasoning in 3D Space with LLM for Robotic Brick Stacking

Add code
Feb 24, 2026
Viaarxiv icon

Towards Mitigating Modality Bias in Vision-Language Models for Temporal Action Localization

Add code
Jan 28, 2026
Viaarxiv icon

Reloc-VGGT: Visual Re-localization with Geometry Grounded Transformer

Add code
Dec 26, 2025
Figure 1 for Reloc-VGGT: Visual Re-localization with Geometry Grounded Transformer
Figure 2 for Reloc-VGGT: Visual Re-localization with Geometry Grounded Transformer
Figure 3 for Reloc-VGGT: Visual Re-localization with Geometry Grounded Transformer
Figure 4 for Reloc-VGGT: Visual Re-localization with Geometry Grounded Transformer
Viaarxiv icon

Enabling Ultra-Fast Cardiovascular Imaging Across Heterogeneous Clinical Environments with a Generalist Foundation Model and Multimodal Database

Add code
Dec 25, 2025
Viaarxiv icon

InfraDiffusion: zero-shot depth map restoration with diffusion models and prompted segmentation from sparse infrastructure point clouds

Add code
Sep 03, 2025
Figure 1 for InfraDiffusion: zero-shot depth map restoration with diffusion models and prompted segmentation from sparse infrastructure point clouds
Figure 2 for InfraDiffusion: zero-shot depth map restoration with diffusion models and prompted segmentation from sparse infrastructure point clouds
Figure 3 for InfraDiffusion: zero-shot depth map restoration with diffusion models and prompted segmentation from sparse infrastructure point clouds
Figure 4 for InfraDiffusion: zero-shot depth map restoration with diffusion models and prompted segmentation from sparse infrastructure point clouds
Viaarxiv icon

ERMV: Editing 4D Robotic Multi-view images to enhance embodied agents

Add code
Jul 23, 2025
Viaarxiv icon

MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking

Add code
Apr 09, 2025
Viaarxiv icon