Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mathieu Salzmann

CVLab EPFL Switzerland

Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions

Mar 31, 2022

Van Nguyen Nguyen, Yinlin Hu, Yang Xiao, Mathieu Salzmann, Vincent Lepetit

Figure 1 for Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions

Figure 2 for Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions

Figure 3 for Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions

Figure 4 for Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions

Abstract:We present a method that can recognize new objects and estimate their 3D pose in RGB images even under partial occlusions. Our method requires neither a training phase on these objects nor real images depicting them, only their CAD models. It relies on a small set of training objects to learn local object representations, which allow us to locally match the input image to a set of "templates", rendered images of the CAD models for the new objects. In contrast with the state-of-the-art methods, the new objects on which our method is applied can be very different from the training objects. As a result, we are the first to show generalization without retraining on the LINEMOD and Occlusion-LINEMOD datasets. Our analysis of the failure modes of previous template-based approaches further confirms the benefits of local features for template matching. We outperform the state-of-the-art template matching methods on the LINEMOD, Occlusion-LINEMOD and T-LESS datasets. Our source code and data are publicly available at https://github.com/nv-nguyen/template-pose

* CVPR 2022

Via

Access Paper or Ask Questions

Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World

Mar 29, 2022

Zheng Dang, Lizhou Wang, Yu Guo, Mathieu Salzmann

Figure 1 for Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World

Figure 2 for Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World

Figure 3 for Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World

Figure 4 for Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World

Abstract:In this work, we tackle the task of estimating the 6D pose of an object from point cloud data. While recent learning-based approaches to addressing this task have shown great success on synthetic datasets, we have observed them to fail in the presence of real-world data. We thus analyze the causes of these failures, which we trace back to the difference between the feature distributions of the source and target point clouds, and the sensitivity of the widely-used SVD-based loss function to the range of rotation between the two point clouds. We address the first challenge by introducing a new normalization strategy, Match Normalization, and the second via the use of a loss function based on the negative log likelihood of point correspondences. Our two contributions are general and can be applied to many existing learning-based 3D object registration frameworks, which we illustrate by implementing them in two of them, DCP and IDAM. Our experiments on the real-scene TUD-L, LINEMOD and Occluded-LINEMOD datasets evidence the benefits of our strategies. They allow for the first time learning-based 3D object registration methods to achieve meaningful results on real-world data. We therefore expect them to be key to the future development of point cloud registration methods.

* 25 pages

Via

Access Paper or Ask Questions

Perspective Flow Aggregation for Data-Limited 6D Object Pose Estimation

Mar 18, 2022

Yinlin Hu, Pascal Fua, Mathieu Salzmann

Figure 1 for Perspective Flow Aggregation for Data-Limited 6D Object Pose Estimation

Figure 2 for Perspective Flow Aggregation for Data-Limited 6D Object Pose Estimation

Figure 3 for Perspective Flow Aggregation for Data-Limited 6D Object Pose Estimation

Figure 4 for Perspective Flow Aggregation for Data-Limited 6D Object Pose Estimation

Abstract:Most recent 6D object pose estimation methods, including unsupervised ones, require many real training images. Unfortunately, for some applications, such as those in space or deep under water, acquiring real images, even unannotated, is virtually impossible. In this paper, we propose a method that can be trained solely on synthetic images, or optionally using a few additional real ones. Given a rough pose estimate obtained from a first network, it uses a second network to predict a dense 2D correspondence field between the image rendered using the rough pose and the real image and infers the required pose correction. This approach is much less sensitive to the domain shift between synthetic and real images than state-of-the-art methods. It performs on par with methods that require annotated real images for training when not using any, and outperforms them considerably when using as few as twenty real images.

Via

Access Paper or Ask Questions

Fusing Local Similarities for Retrieval-based 3D Orientation Estimation of Unseen Objects

Mar 16, 2022

Chen Zhao, Yinlin Hu, Mathieu Salzmann

Figure 1 for Fusing Local Similarities for Retrieval-based 3D Orientation Estimation of Unseen Objects

Figure 2 for Fusing Local Similarities for Retrieval-based 3D Orientation Estimation of Unseen Objects

Figure 3 for Fusing Local Similarities for Retrieval-based 3D Orientation Estimation of Unseen Objects

Figure 4 for Fusing Local Similarities for Retrieval-based 3D Orientation Estimation of Unseen Objects

Abstract:In this paper, we tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images. This task contrasts with the one considered by most existing deep learning methods which typically assume that the testing objects have been observed during training. To handle the unseen objects, we follow a retrieval-based strategy and prevent the network from learning object-specific features by computing multi-scale local similarities between the query image and synthetically-generated reference images. We then introduce an adaptive fusion module that robustly aggregates the local similarities into a global similarity score of pairwise images. Furthermore, we speed up the retrieval process by developing a fast clustering-based retrieval strategy. Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works.

Via

Access Paper or Ask Questions

Robust Binary Models by Pruning Randomly-initialized Networks

Feb 03, 2022

Chen Liu, Ziqi Zhao, Sabine Süsstrunk, Mathieu Salzmann

Figure 1 for Robust Binary Models by Pruning Randomly-initialized Networks

Figure 2 for Robust Binary Models by Pruning Randomly-initialized Networks

Figure 3 for Robust Binary Models by Pruning Randomly-initialized Networks

Figure 4 for Robust Binary Models by Pruning Randomly-initialized Networks

Abstract:We propose ways to obtain robust models against adversarial attacks from randomly-initialized binary networks. Unlike adversarial training, which learns the model parameters, we in contrast learn the structure of the robust model by pruning a randomly-initialized binary network. Our method confirms the strong lottery ticket hypothesis in the presence of adversarial attacks. Compared to the results obtained in a non-adversarial setting, we in addition improve the performance and compression of the model by 1) using an adaptive pruning strategy for different layers, and 2) using a different initialization scheme such that all model parameters are initialized either to +1 or -1. Our extensive experiments demonstrate that our approach performs not only better than the state-of-the art for robust binary networks; it also achieves comparable or even better performance than full-precision network training methods.

Via

Access Paper or Ask Questions

On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training

Dec 14, 2021

Chen Liu, Zhichao Huang, Mathieu Salzmann, Tong Zhang, Sabine Süsstrunk

Figure 1 for On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training

Figure 2 for On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training

Figure 3 for On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training

Figure 4 for On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training

Abstract:Adversarial training is a popular method to robustify models against adversarial attacks. However, it exhibits much more severe overfitting than training on clean inputs. In this work, we investigate this phenomenon from the perspective of training instances, i.e., training input-target pairs. Based on a quantitative metric measuring instances' difficulty, we analyze the model's behavior on training instances of different difficulty levels. This lets us show that the decay in generalization performance of adversarial training is a result of the model's attempt to fit hard adversarial instances. We theoretically verify our observations for both linear and general nonlinear models, proving that models trained on hard instances have worse generalization performance than ones trained on easy instances. Furthermore, we prove that the difference in the generalization gap between models trained by instances of different difficulty levels increases with the size of the adversarial budget. Finally, we conduct case studies on methods mitigating adversarial overfitting in several scenarios. Our analysis shows that methods successfully mitigating adversarial overfitting all avoid fitting hard adversarial instances, while ones fitting hard adversarial instances do not achieve true robustness.

Via

Access Paper or Ask Questions

Adversarial Parametric Pose Prior

Dec 08, 2021

Andrey Davydov, Anastasia Remizova, Victor Constantin, Sina Honari, Mathieu Salzmann, Pascal Fua

Figure 1 for Adversarial Parametric Pose Prior

Figure 2 for Adversarial Parametric Pose Prior

Figure 3 for Adversarial Parametric Pose Prior

Figure 4 for Adversarial Parametric Pose Prior

Abstract:The Skinned Multi-Person Linear (SMPL) model can represent a human body by mapping pose and shape parameters to body meshes. This has been shown to facilitate inferring 3D human pose and shape from images via different learning models. However, not all pose and shape parameter values yield physically-plausible or even realistic body meshes. In other words, SMPL is under-constrained and may thus lead to invalid results when used to reconstruct humans from images, either by directly optimizing its parameters, or by learning a mapping from the image to these parameters. In this paper, we therefore learn a prior that restricts the SMPL parameters to values that produce realistic poses via adversarial training. We show that our learned prior covers the diversity of the real-data distribution, facilitates optimization for 3D reconstruction from 2D keypoints, and yields better pose estimates when used for regression from images. We found that the prior based on spherical distribution gets the best results. Furthermore, in all these tasks, it outperforms the state-of-the-art VAE-based approach to constraining the SMPL parameters.

Via

Access Paper or Ask Questions

Dyadic Human Motion Prediction

Dec 01, 2021

Isinsu Katircioglu, Costa Georgantas, Mathieu Salzmann, Pascal Fua

Figure 1 for Dyadic Human Motion Prediction

Figure 2 for Dyadic Human Motion Prediction

Figure 3 for Dyadic Human Motion Prediction

Figure 4 for Dyadic Human Motion Prediction

Abstract:Prior work on human motion forecasting has mostly focused on predicting the future motion of single subjects in isolation from their past pose sequence. In the presence of closely interacting people, however, this strategy fails to account for the dependencies between the different subject's motions. In this paper, we therefore introduce a motion prediction framework that explicitly reasons about the interactions of two observed subjects. Specifically, we achieve this by introducing a pairwise attention mechanism that models the mutual dependencies in the motion history of the two subjects. This allows us to preserve the long-term motion dynamics in a more realistic way and more robustly predict unusual and fast-paced movements, such as the ones occurring in a dance scenario. To evaluate this, and because no existing motion prediction datasets depict two closely-interacting subjects, we introduce the LindyHop600K dance dataset. Our results evidence that our approach outperforms the state-of-the-art single person motion prediction techniques.

Via

Access Paper or Ask Questions

What Stops Learning-based 3D Registration from Working in the Real World?

Nov 19, 2021

Zheng Dang, Lizhou Wang, Junning Qiu, Minglei Lu, Mathieu Salzmann

Figure 1 for What Stops Learning-based 3D Registration from Working in the Real World?

Figure 2 for What Stops Learning-based 3D Registration from Working in the Real World?

Figure 3 for What Stops Learning-based 3D Registration from Working in the Real World?

Figure 4 for What Stops Learning-based 3D Registration from Working in the Real World?

Abstract:Much progress has been made on the task of learning-based 3D point cloud registration, with existing methods yielding outstanding results on standard benchmarks, such as ModelNet40, even in the partial-to-partial matching scenario. Unfortunately, these methods still struggle in the presence of real data. In this work, we identify the sources of these failures, analyze the reasons behind them, and propose solutions to tackle them. We summarise our findings into a set of guidelines and demonstrate their effectiveness by applying them to different baseline methods, DCP and IDAM. In short, our guidelines improve both their training convergence and testing accuracy. Ultimately, this translates to a best-practice 3D registration network (BPNet), constituting the first learning-based method able to handle previously-unseen objects in real-world data. Despite being trained only on synthetic data, our model generalizes to real data without any fine-tuning, reaching an accuracy of up to 67% on point clouds of unseen objects obtained with a commercial sensor.

Via

Access Paper or Ask Questions

Temporally-Consistent Surface Reconstruction using Metrically-Consistent Atlases

Nov 12, 2021

Jan Bednarik, Noam Aigerman, Vladimir G. Kim, Siddhartha Chaudhuri, Shaifali Parashar, Mathieu Salzmann, Pascal Fua

Figure 1 for Temporally-Consistent Surface Reconstruction using Metrically-Consistent Atlases

Figure 2 for Temporally-Consistent Surface Reconstruction using Metrically-Consistent Atlases

Figure 3 for Temporally-Consistent Surface Reconstruction using Metrically-Consistent Atlases

Figure 4 for Temporally-Consistent Surface Reconstruction using Metrically-Consistent Atlases

Abstract:We propose a method for unsupervised reconstruction of a temporally-consistent sequence of surfaces from a sequence of time-evolving point clouds. It yields dense and semantically meaningful correspondences between frames. We represent the reconstructed surfaces as atlases computed by a neural network, which enables us to establish correspondences between frames. The key to making these correspondences semantically meaningful is to guarantee that the metric tensors computed at corresponding points are as similar as possible. We have devised an optimization strategy that makes our method robust to noise and global motions, without a priori correspondences or pre-alignment steps. As a result, our approach outperforms state-of-the-art ones on several challenging datasets. The code is available at https://github.com/bednarikjan/temporally_coherent_surface_reconstruction.

* 21 pages. arXiv admin note: substantial text overlap with arXiv:2104.06950

Via

Access Paper or Ask Questions