Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mathieu Salzmann

CVLab EPFL Switzerland

DrapeNet: Generating Garments and Draping them with Self-Supervision

Dec 07, 2022

Luca De Luigi, Ren Li, Benoît Guillard, Mathieu Salzmann, Pascal Fua

Abstract:Recent approaches to drape garments quickly over arbitrary human bodies leverage self-supervision to eliminate the need for large training sets. However, they are designed to train one network per clothing item, which severely limits their generalization abilities. In our work, we rely on self-supervision to train a single network to drape multiple garments. This is achieved by predicting a 3D deformation field conditioned on the latent codes of a generative network, which models garments as unsigned distance fields. Our pipeline can generate and drape previously unseen garments of any topology, whose shape can be edited by manipulating their latent codes. Being fully differentiable, our formulation makes it possible to recover accurate 3D models of garments from partial observations -- images or 3D scans -- via gradient descent. Our code will be made publicly available.

Via

Access Paper or Ask Questions

Finer-Grained Correlations: Location Priors for Unseen Object Pose Estimation

Nov 29, 2022

Chen Zhao, Yinlin Hu, Mathieu Salzmann

Figure 1 for Finer-Grained Correlations: Location Priors for Unseen Object Pose Estimation

Figure 2 for Finer-Grained Correlations: Location Priors for Unseen Object Pose Estimation

Figure 3 for Finer-Grained Correlations: Location Priors for Unseen Object Pose Estimation

Figure 4 for Finer-Grained Correlations: Location Priors for Unseen Object Pose Estimation

Abstract:We present a new method which provides object location priors for previously unseen object 6D pose estimation. Existing approaches build upon a template matching strategy and convolve a set of reference images with the query. Unfortunately, their performance is affected by the object scale mismatches between the references and the query. To address this issue, we present a finer-grained correlation estimation module, which handles the object scale mismatches by computing correlations with adjustable receptive fields. We also propose to decouple the correlations into scale-robust and scale-aware representations to estimate the object location and size, respectively. Our method achieves state-of-the-art unseen object localization and 6D pose estimation results on LINEMOD and GenMOP. We further construct a challenging synthetic dataset, where the results highlight the better robustness of our method to varying backgrounds, illuminations, and object sizes, as well as to the reference-query domain gap.

Via

Access Paper or Ask Questions

Contact-aware Human Motion Forecasting

Oct 08, 2022

Wei Mao, Miaomiao Liu, Richard Hartley, Mathieu Salzmann

Figure 1 for Contact-aware Human Motion Forecasting

Figure 2 for Contact-aware Human Motion Forecasting

Figure 3 for Contact-aware Human Motion Forecasting

Figure 4 for Contact-aware Human Motion Forecasting

Abstract:In this paper, we tackle the task of scene-aware 3D human motion forecasting, which consists of predicting future human poses given a 3D scene and a past human motion. A key challenge of this task is to ensure consistency between the human and the scene, accounting for human-scene interactions. Previous attempts to do so model such interactions only implicitly, and thus tend to produce artifacts such as "ghost motion" because of the lack of explicit constraints between the local poses and the global motion. Here, by contrast, we propose to explicitly model the human-scene contacts. To this end, we introduce distance-based contact maps that capture the contact relationships between every joint and every 3D scene point at each time instant. We then develop a two-stage pipeline that first predicts the future contact maps from the past ones and the scene point cloud, and then forecasts the future human poses by conditioning them on the predicted contact maps. During training, we explicitly encourage consistency between the global motion and the local poses via a prior defined using the contact maps and future poses. Our approach outperforms the state-of-the-art human motion forecasting and human synthesis methods on both synthetic and real datasets. Our code is available at https://github.com/wei-mao-2019/ContAwareMotionPred.

* Accepted to NeurIPS2022

Via

Access Paper or Ask Questions

Perspective Aware Road Obstacle Detection

Oct 04, 2022

Krzysztof Lis, Sina Honari, Pascal Fua, Mathieu Salzmann

Figure 1 for Perspective Aware Road Obstacle Detection

Figure 2 for Perspective Aware Road Obstacle Detection

Figure 3 for Perspective Aware Road Obstacle Detection

Figure 4 for Perspective Aware Road Obstacle Detection

Abstract:While road obstacle detection techniques have become increasingly effective, they typically ignore the fact that, in practice, the apparent size of the obstacles decreases as their distance to the vehicle increases. In this paper, we account for this by computing a scale map encoding the apparent size of a hypothetical object at every image location. We then leverage this perspective map to (i) generate training data by injecting synthetic objects onto the road in a more realistic fashion than existing methods; and (ii) incorporate perspective information in the decoding part of the detection network to guide the obstacle detector. Our results on standard benchmarks show that, together, these two strategies significantly boost the obstacle detection performance, allowing our approach to consistently outperform state-of-the-art methods in terms of instance-level obstacle detection.

Via

Access Paper or Ask Questions

3D Pose Based Feedback for Physical Exercises

Aug 05, 2022

Ziyi Zhao, Sena Kiciroglu, Hugues Vinzant, Yuan Cheng, Isinsu Katircioglu, Mathieu Salzmann, Pascal Fua

Figure 1 for 3D Pose Based Feedback for Physical Exercises

Figure 2 for 3D Pose Based Feedback for Physical Exercises

Figure 3 for 3D Pose Based Feedback for Physical Exercises

Figure 4 for 3D Pose Based Feedback for Physical Exercises

Abstract:Unsupervised self-rehabilitation exercises and physical training can cause serious injuries if performed incorrectly. We introduce a learning-based framework that identifies the mistakes made by a user and proposes corrective measures for easier and safer individual training. Our framework does not rely on hard-coded, heuristic rules. Instead, it learns them from data, which facilitates its adaptation to specific user needs. To this end, we use a Graph Convolutional Network (GCN) architecture acting on the user's pose sequence to model the relationship between the body joints trajectories. To evaluate our approach, we introduce a dataset with 3 different physical exercises. Our approach yields 90.9% mistake identification accuracy and successfully corrects 94.2% of the mistakes.

* Video: https://youtu.be/W3kyyeHe0SI

Via

Access Paper or Ask Questions

Fast Adversarial Training with Adaptive Step Size

Jun 06, 2022

Zhichao Huang, Yanbo Fan, Chen Liu, Weizhong Zhang, Yong Zhang, Mathieu Salzmann, Sabine Süsstrunk, Jue Wang

Figure 1 for Fast Adversarial Training with Adaptive Step Size

Figure 2 for Fast Adversarial Training with Adaptive Step Size

Figure 3 for Fast Adversarial Training with Adaptive Step Size

Figure 4 for Fast Adversarial Training with Adaptive Step Size

Abstract:While adversarial training and its variants have shown to be the most effective algorithms to defend against adversarial attacks, their extremely slow training process makes it hard to scale to large datasets like ImageNet. The key idea of recent works to accelerate adversarial training is to substitute multi-step attacks (e.g., PGD) with single-step attacks (e.g., FGSM). However, these single-step methods suffer from catastrophic overfitting, where the accuracy against PGD attack suddenly drops to nearly 0% during training, destroying the robustness of the networks. In this work, we study the phenomenon from the perspective of training instances. We show that catastrophic overfitting is instance-dependent and fitting instances with larger gradient norm is more likely to cause catastrophic overfitting. Based on our findings, we propose a simple but effective method, Adversarial Training with Adaptive Step size (ATAS). ATAS learns an instancewise adaptive step size that is inversely proportional to its gradient norm. The theoretical analysis shows that ATAS converges faster than the commonly adopted non-adaptive counterparts. Empirically, ATAS consistently mitigates catastrophic overfitting and achieves higher robust accuracy on CIFAR10, CIFAR100 and ImageNet when evaluated on various adversarial budgets.

Via

Access Paper or Ask Questions

Weakly-supervised Action Transition Learning for Stochastic Human Motion Prediction

May 31, 2022

Wei Mao, Miaomiao Liu, Mathieu Salzmann

Figure 1 for Weakly-supervised Action Transition Learning for Stochastic Human Motion Prediction

Figure 2 for Weakly-supervised Action Transition Learning for Stochastic Human Motion Prediction

Figure 3 for Weakly-supervised Action Transition Learning for Stochastic Human Motion Prediction

Figure 4 for Weakly-supervised Action Transition Learning for Stochastic Human Motion Prediction

Abstract:We introduce the task of action-driven stochastic human motion prediction, which aims to predict multiple plausible future motions given a sequence of action labels and a short motion history. This differs from existing works, which predict motions that either do not respect any specific action category, or follow a single action label. In particular, addressing this task requires tackling two challenges: The transitions between the different actions must be smooth; the length of the predicted motion depends on the action sequence and varies significantly across samples. As we cannot realistically expect training data to cover sufficiently diverse action transitions and motion lengths, we propose an effective training strategy consisting of combining multiple motions from different actions and introducing a weak form of supervision to encourage smooth transitions. We then design a VAE-based model conditioned on both the observed motion and the action label sequence, allowing us to generate multiple plausible future motions of varying length. We illustrate the generality of our approach by exploring its use with two different temporal encoding models, namely RNNs and Transformers. Our approach outperforms baseline models constructed by adapting state-of-the-art single action-conditioned motion generation methods and stochastic human motion prediction approaches to our new task of action-driven stochastic motion prediction. Our code is available at https://github.com/wei-mao-2019/WAT.

* CVPR2022 (Oral)

Via

Access Paper or Ask Questions

Knowledge Distillation for 6D Pose Estimation by Keypoint Distribution Alignment

May 30, 2022

Shuxuan Guo, Yinlin Hu, Jose M. Alvarez, Mathieu Salzmann

Figure 1 for Knowledge Distillation for 6D Pose Estimation by Keypoint Distribution Alignment

Figure 2 for Knowledge Distillation for 6D Pose Estimation by Keypoint Distribution Alignment

Figure 3 for Knowledge Distillation for 6D Pose Estimation by Keypoint Distribution Alignment

Figure 4 for Knowledge Distillation for 6D Pose Estimation by Keypoint Distribution Alignment

Abstract:Knowledge distillation facilitates the training of a compact student network by using a deep teacher one. While this has achieved great success in many tasks, it remains completely unstudied for image-based 6D object pose estimation. In this work, we introduce the first knowledge distillation method for 6D pose estimation. Specifically, we follow a standard approach to 6D pose estimation, consisting of predicting the 2D image locations of object keypoints. In this context, we observe the compact student network to struggle predicting precise 2D keypoint locations. Therefore, to address this, instead of training the student with keypoint-to-keypoint supervision, we introduce a strategy based the optimal transport theory that distills the teacher's keypoint \emph{distribution} into the student network, facilitating its training. Our experiments on several benchmarks show that our distillation method yields state-of-the-art results with different compact student models.

Via

Access Paper or Ask Questions

MulT: An End-to-End Multitask Learning Transformer

May 17, 2022

Deblina Bhattacharjee, Tong Zhang, Sabine Süsstrunk, Mathieu Salzmann

Figure 1 for MulT: An End-to-End Multitask Learning Transformer

Figure 2 for MulT: An End-to-End Multitask Learning Transformer

Abstract:We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks, including depth estimation, semantic segmentation, reshading, surface normal estimation, 2D keypoint detection, and edge detection. Based on the Swin transformer model, our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads. At the heart of our approach is a shared attention mechanism modeling the dependencies across the tasks. We evaluate our model on several multitask benchmarks, showing that our MulT framework outperforms both the state-of-the art multitask convolutional neural network models and all the respective single task transformer models. Our experiments further highlight the benefits of sharing attention across all the tasks, and demonstrate that our MulT model is robust and generalizes well to new domains. Our project website is at https://ivrl.github.io/MulT/.

* Accepted to CVPR 2022

Via

Access Paper or Ask Questions

Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy

Apr 13, 2022

Tong Zhang, Congpei Qiu, Wei Ke, Sabine Süsstrunk, Mathieu Salzmann

Figure 1 for Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy

Figure 2 for Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy

Figure 3 for Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy

Figure 4 for Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy

Abstract:Self-supervised learning (SSL) methods aim to learn view-invariant representations by maximizing the similarity between the features extracted from different crops of the same image regardless of cropping size and content. In essence, this strategy ignores the fact that two crops may truly contain different image information, e.g., background and small objects, and thus tends to restrain the diversity of the learned representations. In this work, we address this issue by introducing a new self-supervised learning strategy, LoGo, that explicitly reasons about Local and Global crops. To achieve view invariance, LoGo encourages similarity between global crops from the same image, as well as between a global and a local crop. However, to correctly encode the fact that the content of smaller crops may differ entirely, LoGo promotes two local crops to have dissimilar representations, while being close to global crops. Our LoGo strategy can easily be applied to existing SSL methods. Our extensive experiments on a variety of datasets and using different self-supervised learning frameworks validate its superiority over existing approaches. Noticeably, we achieve better results than supervised models on transfer learning when using only 1/10 of the data.

* accepted in CVPR 2022

Via

Access Paper or Ask Questions