Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MK-Pose: Category-Level Object Pose Estimation via Multimodal-Based Keypoint Learning

Jul 09, 2025

Yifan Yang, Peili Song, Enfan Lan, Dong Liu, Jingtai Liu

Figure 1 for MK-Pose: Category-Level Object Pose Estimation via Multimodal-Based Keypoint Learning

Figure 2 for MK-Pose: Category-Level Object Pose Estimation via Multimodal-Based Keypoint Learning

Figure 3 for MK-Pose: Category-Level Object Pose Estimation via Multimodal-Based Keypoint Learning

Figure 4 for MK-Pose: Category-Level Object Pose Estimation via Multimodal-Based Keypoint Learning

Share this with someone who'll enjoy it:

Abstract:Category-level object pose estimation, which predicts the pose of objects within a known category without prior knowledge of individual instances, is essential in applications like warehouse automation and manufacturing. Existing methods relying on RGB images or point cloud data often struggle with object occlusion and generalization across different instances and categories. This paper proposes a multimodal-based keypoint learning framework (MK-Pose) that integrates RGB images, point clouds, and category-level textual descriptions. The model uses a self-supervised keypoint detection module enhanced with attention-based query generation, soft heatmap matching and graph-based relational modeling. Additionally, a graph-enhanced feature fusion module is designed to integrate local geometric information and global context. MK-Pose is evaluated on CAMERA25 and REAL275 dataset, and is further tested for cross-dataset capability on HouseCat6D dataset. The results demonstrate that MK-Pose outperforms existing state-of-the-art methods in both IoU and average precision without shape priors. Codes will be released at \href{https://github.com/yangyifanYYF/MK-Pose}{https://github.com/yangyifanYYF/MK-Pose}.

View paper on

Share this with someone who'll enjoy it:

Title:MK-Pose: Category-Level Object Pose Estimation via Multimodal-Based Keypoint Learning

Paper and Code