Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic:Keypoint Detection

What is Keypoint Detection? Keypoint detection is essential for analyzing and interpreting images in computer vision. It involves simultaneously detecting and localizing interesting points in an image. Keypoints, also known as interest points, are spatial locations or points in the image that define what is interesting or what stands out. They are invariant to image rotation, shrinkage, translation, distortion, etc. Keypoint examples include body joints, facial landmarks, or any other salient points in objects. Keypoints have uses in problems such as pose estimation, object detection and tracking, facial analysis, and augmented reality.

TopoMaskV2: Enhanced Instance-Mask-Based Formulation for the Road Topology Problem

Sep 17, 2024

M. Esat Kalfaoglu, Halil Ibrahim Ozturk, Ozsel Kilinc, Alptekin Temizel

Figure 1 for TopoMaskV2: Enhanced Instance-Mask-Based Formulation for the Road Topology Problem

Figure 2 for TopoMaskV2: Enhanced Instance-Mask-Based Formulation for the Road Topology Problem

Figure 3 for TopoMaskV2: Enhanced Instance-Mask-Based Formulation for the Road Topology Problem

Figure 4 for TopoMaskV2: Enhanced Instance-Mask-Based Formulation for the Road Topology Problem

Abstract:Recently, the centerline has become a popular representation of lanes due to its advantages in solving the road topology problem. To enhance centerline prediction, we have developed a new approach called TopoMask. Unlike previous methods that rely on keypoints or parametric methods, TopoMask utilizes an instance-mask-based formulation coupled with a masked-attention-based transformer architecture. We introduce a quad-direction label representation to enrich the mask instances with flow information and design a corresponding post-processing technique for mask-to-centerline conversion. Additionally, we demonstrate that the instance-mask formulation provides complementary information to parametric Bezier regressions, and fusing both outputs leads to improved detection and topology performance. Moreover, we analyze the shortcomings of the pillar assumption in the Lift Splat technique and adapt a multi-height bin configuration. Experimental results show that TopoMask achieves state-of-the-art performance in the OpenLane-V2 dataset, increasing from 44.1 to 49.4 for Subset-A and 44.7 to 51.8 for Subset-B in the V1.1 OLS baseline.

* Accepted to ECCV 2024 2nd Workshop on Vision-Centric Autonomous Driving (VCAD). TopoMaskV2 includes significant architectural improvements and extensive ablation studies over the original TopoMask, which received an innovation award in the OpenLane Topology Challenge 2023

Via

Access Paper or Ask Questions

Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation

Mar 28, 2024

Xiao Lin, Wenfei Yang, Yuan Gao, Tianzhu Zhang

Figure 1 for Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation

Figure 2 for Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation

Figure 3 for Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation

Figure 4 for Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation

Abstract:Category-level 6D object pose estimation aims to estimate the rotation, translation and size of unseen instances within specific categories. In this area, dense correspondence-based methods have achieved leading performance. However, they do not explicitly consider the local and global geometric information of different instances, resulting in poor generalization ability to unseen instances with significant shape variations. To deal with this problem, we propose a novel Instance-Adaptive and Geometric-Aware Keypoint Learning method for category-level 6D object pose estimation (AG-Pose), which includes two key designs: (1) The first design is an Instance-Adaptive Keypoint Detection module, which can adaptively detect a set of sparse keypoints for various instances to represent their geometric structures. (2) The second design is a Geometric-Aware Feature Aggregation module, which can efficiently integrate the local and global geometric information into keypoint features. These two modules can work together to establish robust keypoint-level correspondences for unseen instances, thus enhancing the generalization ability of the model.Experimental results on CAMERA25 and REAL275 datasets show that the proposed AG-Pose outperforms state-of-the-art methods by a large margin without category-specific shape priors.

* Accepted to CVPR2024

Via

Access Paper or Ask Questions

RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation

Mar 27, 2024

Yang Tian, Jiyao Zhang, Guowei Huang, Bin Wang, Ping Wang, Jiangmiao Pang, Hao Dong

Figure 1 for RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation

Figure 2 for RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation

Figure 3 for RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation

Figure 4 for RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation

Abstract:Estimating robot pose and joint angles is significant in advanced robotics, enabling applications like robot collaboration and online hand-eye calibration.However, the introduction of unknown joint angles makes prediction more complex than simple robot pose estimation, due to its higher dimensionality.Previous methods either regress 3D keypoints directly or utilise a render&compare strategy. These approaches often falter in terms of performance or efficiency and grapple with the cross-camera gap problem.This paper presents a novel framework that bifurcates the high-dimensional prediction task into two manageable subtasks: 2D keypoints detection and lifting 2D keypoints to 3D. This separation promises enhanced performance without sacrificing the efficiency innate to keypoint-based techniques.A vital component of our method is the lifting of 2D keypoints to 3D keypoints. Common deterministic regression methods may falter when faced with uncertainties from 2D detection errors or self-occlusions.Leveraging the robust modeling potential of diffusion models, we reframe this issue as a conditional 3D keypoints generation task. To bolster cross-camera adaptability, we introduce theNormalised Camera Coordinate Space (NCCS), ensuring alignment of estimated 2D keypoints across varying camera intrinsics.Experimental results demonstrate that the proposed method outperforms the state-of-the-art render\&compare method and achieves higher inference speed.Furthermore, the tests accentuate our method's robust cross-camera generalisation capabilities.We intend to release both the dataset and code in https://nimolty.github.io/Robokeygen/

* Accepted by ICRA 2024

Via

Access Paper or Ask Questions

Using joint angles based on the international biomechanical standards for human action recognition and related tasks

Jun 25, 2024

Kevin Schlegel, Lei Jiang, Hao Ni

Figure 1 for Using joint angles based on the international biomechanical standards for human action recognition and related tasks

Figure 2 for Using joint angles based on the international biomechanical standards for human action recognition and related tasks

Figure 3 for Using joint angles based on the international biomechanical standards for human action recognition and related tasks

Figure 4 for Using joint angles based on the international biomechanical standards for human action recognition and related tasks

Abstract:Keypoint data has received a considerable amount of attention in machine learning for tasks like action detection and recognition. However, human experts in movement such as doctors, physiotherapists, sports scientists and coaches use a notion of joint angles standardised by the International Society of Biomechanics to precisely and efficiently communicate static body poses and movements. In this paper, we introduce the basic biomechanical notions and show how they can be used to convert common keypoint data into joint angles that uniquely describe the given pose and have various desirable mathematical properties, such as independence of both the camera viewpoint and the person performing the action. We experimentally demonstrate that the joint angle representation of keypoint data is suitable for machine learning applications and can in some cases bring an immediate performance gain. The use of joint angles as a human meaningful representation of kinematic data is in particular promising for applications where interpretability and dialog with human experts is important, such as many sports and medical applications. To facilitate further research in this direction, we will release a python package to convert keypoint data into joint angles as outlined in this paper.

Via

Access Paper or Ask Questions

Temporally-consistent 3D Reconstruction of Birds

Aug 24, 2024

Johannes Hägerlind, Jonas Hentati-Sundberg, Bastian Wandt

Abstract:This paper deals with 3D reconstruction of seabirds which recently came into focus of environmental scientists as valuable bio-indicators for environmental change. Such 3D information is beneficial for analyzing the bird's behavior and physiological shape, for example by tracking motion, shape, and appearance changes. From a computer vision perspective birds are especially challenging due to their rapid and oftentimes non-rigid motions. We propose an approach to reconstruct the 3D pose and shape from monocular videos of a specific breed of seabird - the common murre. Our approach comprises a full pipeline of detection, tracking, segmentation, and temporally consistent 3D reconstruction. Additionally, we propose a temporal loss that extends current single-image 3D bird pose estimators to the temporal domain. Moreover, we provide a real-world dataset of 10000 frames of video observations on average capture nine birds simultaneously, comprising a large variety of motions and interactions, including a smaller test set with bird-specific keypoint labels. Using our temporal optimization, we achieve state-of-the-art performance for the challenging sequences in our dataset.

Via

Access Paper or Ask Questions

SD-Net: Symmetric-Aware Keypoint Prediction and Domain Adaptation for 6D Pose Estimation In Bin-picking Scenarios

Mar 14, 2024

Ding-Tao Huang, En-Te Lin, Lipeng Chen, Li-Fu Liu, Long Zeng

Figure 1 for SD-Net: Symmetric-Aware Keypoint Prediction and Domain Adaptation for 6D Pose Estimation In Bin-picking Scenarios

Figure 2 for SD-Net: Symmetric-Aware Keypoint Prediction and Domain Adaptation for 6D Pose Estimation In Bin-picking Scenarios

Figure 3 for SD-Net: Symmetric-Aware Keypoint Prediction and Domain Adaptation for 6D Pose Estimation In Bin-picking Scenarios

Figure 4 for SD-Net: Symmetric-Aware Keypoint Prediction and Domain Adaptation for 6D Pose Estimation In Bin-picking Scenarios

Abstract:Despite the success in 6D pose estimation in bin-picking scenarios, existing methods still struggle to produce accurate prediction results for symmetry objects and real world scenarios. The primary bottlenecks include 1) the ambiguity keypoints caused by object symmetries; 2) the domain gap between real and synthetic data. To circumvent these problem, we propose a new 6D pose estimation network with symmetric-aware keypoint prediction and self-training domain adaptation (SD-Net). SD-Net builds on pointwise keypoint regression and deep hough voting to perform reliable detection keypoint under clutter and occlusion. Specifically, at the keypoint prediction stage, we designe a robust 3D keypoints selection strategy considering the symmetry class of objects and equivalent keypoints, which facilitate locating 3D keypoints even in highly occluded scenes. Additionally, we build an effective filtering algorithm on predicted keypoint to dynamically eliminate multiple ambiguity and outlier keypoint candidates. At the domain adaptation stage, we propose the self-training framework using a student-teacher training scheme. To carefully distinguish reliable predictions, we harnesses a tailored heuristics for 3D geometry pseudo labelling based on semi-chamfer distance. On public Sil'eane dataset, SD-Net achieves state-of-the-art results, obtaining an average precision of 96%. Testing learning and generalization abilities on public Parametric datasets, SD-Net is 8% higher than the state-of-the-art method. The code is available at https://github.com/dingthuang/SD-Net.

Via

Access Paper or Ask Questions

Personalized Topology-Informed 12-Lead ECG Electrode Localization from Incomplete Cardiac MRIs for Efficient Cardiac Digital Twins

Aug 25, 2024

Lei Li, Hannah Smith, Yilin Lyu, Julia Camps, Blanca Rodriguez, Abhirup Banerjee, Vicente Grau

Abstract:Cardiac digital twins (CDTs) offer personalized \textit{in-silico} cardiac representations for the inference of multi-scale properties tied to cardiac mechanisms. The creation of CDTs requires precise information about the electrode position on the torso, especially for the personalized electrocardiogram (ECG) calibration. However, current studies commonly rely on additional acquisition of torso imaging and manual/semi-automatic methods for ECG electrode localization. In this study, we propose a novel and efficient topology-informed model to fully automatically extract personalized ECG electrode locations from 2D clinically standard cardiac MRIs. Specifically, we obtain the sparse torso contours from the cardiac MRIs and then localize the electrodes from the contours. Cardiac MRIs aim at imaging of the heart instead of the torso, leading to incomplete torso geometry within the imaging. To tackle the missing topology, we incorporate the electrodes as a subset of the keypoints, which can be explicitly aligned with the 3D torso topology. The experimental results demonstrate that the proposed model outperforms the time-consuming conventional method in terms of accuracy (Euclidean distance: $1.24 \pm 0.293$ cm vs. $1.48 \pm 0.362$ cm) and efficiency ($2$~s vs. $30$-$35$~min). We further demonstrate the effectiveness of using the detected electrodes for \textit{in-silico} ECG simulation, highlighting their potential for creating accurate and efficient CDT models. The code will be released publicly after the manuscript is accepted for publication.

* 12 pages

Via

Access Paper or Ask Questions

MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception

Jun 15, 2024

M. Mahbubur Rahman, Ryoma Yataka, Sorachi Kato, Pu Perry Wang, Peizhao Li, Adriano Cardace, Petros Boufounos

Figure 1 for MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception

Figure 2 for MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception

Figure 3 for MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception

Figure 4 for MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception

Abstract:Compared with an extensive list of automotive radar datasets that support autonomous driving, indoor radar datasets are scarce at a smaller scale in the format of low-resolution radar point clouds and usually under an open-space single-room setting. In this paper, we scale up indoor radar data collection using multi-view high-resolution radar heatmap in a multi-day, multi-room, and multi-subject setting, with an emphasis on the diversity of environment and subjects. Referred to as the millimeter-wave multi-view radar (MMVR) dataset, it consists of $345$K multi-view radar frames collected from $25$ human subjects over $6$ different rooms, $446$K annotated bounding boxes/segmentation instances, and $7.59$ million annotated keypoints to support three major perception tasks of object detection, pose estimation, and instance segmentation, respectively. For each task, we report performance benchmarks under two protocols: a single subject in an open space and multiple subjects in several cluttered rooms with two data splits: random split and cross-environment split over $395$ 1-min data segments. We anticipate that MMVR facilitates indoor radar perception development for indoor vehicle (robot/humanoid) navigation, building energy management, and elderly care for better efficiency, user experience, and safety.

* 26 pages, 25 figures, 9 tables

Via

Access Paper or Ask Questions

Self-supervised 3D Patient Modeling with Multi-modal Attentive Fusion

Mar 05, 2024

Meng Zheng, Benjamin Planche, Xuan Gong, Fan Yang, Terrence Chen, Ziyan Wu

Abstract:3D patient body modeling is critical to the success of automated patient positioning for smart medical scanning and operating rooms. Existing CNN-based end-to-end patient modeling solutions typically require a) customized network designs demanding large amount of relevant training data, covering extensive realistic clinical scenarios (e.g., patient covered by sheets), which leads to suboptimal generalizability in practical deployment, b) expensive 3D human model annotations, i.e., requiring huge amount of manual effort, resulting in systems that scale poorly. To address these issues, we propose a generic modularized 3D patient modeling method consists of (a) a multi-modal keypoint detection module with attentive fusion for 2D patient joint localization, to learn complementary cross-modality patient body information, leading to improved keypoint localization robustness and generalizability in a wide variety of imaging (e.g., CT, MRI etc.) and clinical scenarios (e.g., heavy occlusions); and (b) a self-supervised 3D mesh regression module which does not require expensive 3D mesh parameter annotations to train, bringing immediate cost benefits for clinical deployment. We demonstrate the efficacy of the proposed method by extensive patient positioning experiments on both public and clinical data. Our evaluation results achieve superior patient positioning performance across various imaging modalities in real clinical scenarios.

* MICCAI 2022

Via

Access Paper or Ask Questions

3D Extended Object Tracking by Fusing Roadside Sparse Radar Point Clouds and Pixel Keypoints

Apr 27, 2024

Jiayin Deng, Zhiqun Hu, Yuxuan Xia, Zhaoming Lu, Xiangming Wen

Figure 1 for 3D Extended Object Tracking by Fusing Roadside Sparse Radar Point Clouds and Pixel Keypoints

Figure 2 for 3D Extended Object Tracking by Fusing Roadside Sparse Radar Point Clouds and Pixel Keypoints

Figure 3 for 3D Extended Object Tracking by Fusing Roadside Sparse Radar Point Clouds and Pixel Keypoints

Figure 4 for 3D Extended Object Tracking by Fusing Roadside Sparse Radar Point Clouds and Pixel Keypoints

Abstract:Roadside perception is a key component in intelligent transportation systems. In this paper, we present a novel three-dimensional (3D) extended object tracking (EOT) method, which simultaneously estimates the object kinematics and extent state, in roadside perception using both the radar and camera data. Because of the influence of sensor viewing angle and limited angle resolution, radar measurements from objects are sparse and non-uniformly distributed, leading to inaccuracies in object extent and position estimation. To address this problem, we present a novel spherical Gaussian function weighted Gaussian mixture model. This model assumes that radar measurements originate from a series of probabilistic weighted radar reflectors on the vehicle's extent. Additionally, we utilize visual detection of vehicle keypoints to provide additional information on the positions of radar reflectors. Since keypoints may not always correspond to radar reflectors, we propose an elastic skeleton fusion mechanism, which constructs a virtual force to establish the relationship between the radar reflectors on the vehicle and its extent. Furthermore, to better describe the kinematic state of the vehicle and constrain its extent state, we develop a new 3D constant turn rate and velocity motion model, considering the complex 3D motion of the vehicle relative to the roadside sensor. Finally, we apply variational Bayesian approximation to the intractable measurement update step to enable recursive Bayesian estimation of the object's state. Simulation results using the Carla simulator and experimental results on the nuScenes dataset demonstrate the effectiveness and superiority of the proposed method in comparison to several state-of-the-art 3D EOT methods.

Via

Access Paper or Ask Questions

Topic:Keypoint Detection

Papers and Code