Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mathieu Salzmann

CVLab EPFL Switzerland

Learning Variations in Human Motion via Mix-and-Match Perturbation

Aug 02, 2019

Mohammad Sadegh Aliakbarian, Fatemeh Sadat Saleh, Mathieu Salzmann, Lars Petersson, Stephen Gould, Amirhossein Habibian

Figure 1 for Learning Variations in Human Motion via Mix-and-Match Perturbation

Figure 2 for Learning Variations in Human Motion via Mix-and-Match Perturbation

Figure 3 for Learning Variations in Human Motion via Mix-and-Match Perturbation

Figure 4 for Learning Variations in Human Motion via Mix-and-Match Perturbation

Abstract:Human motion prediction is a stochastic process: Given an observed sequence of poses, multiple future motions are plausible. Existing approaches to modeling this stochasticity typically combine a random noise vector with information about the previous poses. This combination, however, is done in a deterministic manner, which gives the network the flexibility to learn to ignore the random noise. In this paper, we introduce an approach to stochastically combine the root of variations with previous pose information, which forces the model to take the noise into account. We exploit this idea for motion prediction by incorporating it into a recurrent encoder-decoder network with a conditional variational autoencoder block that learns to exploit the perturbations. Our experiments demonstrate that our model yields high-quality pose sequences that are much more diverse than those from state-of-the-art stochastic motion prediction techniques.

Via

Access Paper or Ask Questions

Self-supervised Training of Proposal-based Segmentation via Background Prediction

Jul 18, 2019

Isinsu Katircioglu, Helge Rhodin, Victor Constantin, Jörg Spörri, Mathieu Salzmann, Pascal Fua

Figure 1 for Self-supervised Training of Proposal-based Segmentation via Background Prediction

Figure 2 for Self-supervised Training of Proposal-based Segmentation via Background Prediction

Figure 3 for Self-supervised Training of Proposal-based Segmentation via Background Prediction

Figure 4 for Self-supervised Training of Proposal-based Segmentation via Background Prediction

Abstract:While supervised object detection methods achieve impressive accuracy, they generalize poorly to images whose appearance significantly differs from the data they have been trained on. To address this in scenarios where annotating data is prohibitively expensive, we introduce a self-supervised approach to object detection and segmentation, able to work with monocular images captured with a moving camera. At the heart of our approach lies the observation that segmentation and background reconstruction are linked tasks, and the idea that, because we observe a structured scene, background regions can be re-synthesized from their surroundings, whereas regions depicting the object cannot. We therefore encode this intuition as a self-supervised loss function that we exploit to train a proposal-based segmentation network. To account for the discrete nature of object proposals, we develop a Monte Carlo-based training strategy that allows us to explore the large space of object proposals. Our experiments demonstrate that our approach yields accurate detections and segmentations in images that visually depart from those of standard benchmarks, outperforming existing self-supervised methods and approaching weakly supervised ones that exploit large annotated datasets.

Via

Access Paper or Ask Questions

Backpropagation-Friendly Eigendecomposition

Jun 27, 2019

Wei Wang, Zheng Dang, Yinlin Hu, Pascal Fua, Mathieu Salzmann

Figure 1 for Backpropagation-Friendly Eigendecomposition

Figure 2 for Backpropagation-Friendly Eigendecomposition

Figure 3 for Backpropagation-Friendly Eigendecomposition

Figure 4 for Backpropagation-Friendly Eigendecomposition

Abstract:Eigendecomposition (ED) is widely used in deep networks. However, the backpropagation of its results tends to be numerically unstable, whether using ED directly or approximating it with the Power Iteration method, particularly when dealing with large matrices. While this can be mitigated by partitioning the data in small and arbitrary groups, doing so has no theoretical basis and makes its impossible to exploit the power of ED to the full. In this paper, we introduce a numerically stable and differentiable approach to leveraging eigenvectors in deep networks. It can handle large matrices without requiring to split them. We demonstrate the better robustness of our approach over standard ED and PI for ZCA whitening, an alternative to batch normalization, and for PCA denoising, which we introduce as a new normalization strategy for deep networks, aiming to further denoise the network's features.

Via

Access Paper or Ask Questions

Recurrent U-Net for Resource-Constrained Segmentation

Jun 11, 2019

Wei Wang, Kaicheng Yu, Joachim Hugonot, Pascal Fua, Mathieu Salzmann

Figure 1 for Recurrent U-Net for Resource-Constrained Segmentation

Figure 2 for Recurrent U-Net for Resource-Constrained Segmentation

Figure 3 for Recurrent U-Net for Resource-Constrained Segmentation

Figure 4 for Recurrent U-Net for Resource-Constrained Segmentation

Abstract:State-of-the-art segmentation methods rely on very deep networks that are not always easy to train without very large training datasets and tend to be relatively slow to run on standard GPUs. In this paper, we introduce a novel recurrent U-Net architecture that preserves the compactness of the original U-Net, while substantially increasing its performance to the point where it outperforms the state of the art on several benchmarks. We will demonstrate its effectiveness for several tasks, including hand segmentation, retina vessel segmentation, and road segmentation. We also introduce a large-scale dataset for hand segmentation.

* arXiv admin note: substantial text overlap with arXiv:1811.10914

Via

Access Paper or Ask Questions

Detecting the Unexpected via Image Resynthesis

Apr 17, 2019

Krzysztof Lis, Krishna Nakka, Pascal Fua, Mathieu Salzmann

Figure 1 for Detecting the Unexpected via Image Resynthesis

Figure 2 for Detecting the Unexpected via Image Resynthesis

Figure 3 for Detecting the Unexpected via Image Resynthesis

Figure 4 for Detecting the Unexpected via Image Resynthesis

Abstract:Classical semantic segmentation methods, including the recent deep learning ones, assume that all classes observed at test time have been seen during training. In this paper, we tackle the more realistic scenario where unexpected objects of unknown classes can appear at test time. The main trends in this area either leverage the notion of prediction uncertainty to flag the regions with low confidence as unknown, or rely on autoencoders and highlight poorly-decoded regions. Having observed that, in both cases, the detected regions typically do not correspond to unexpected objects, in this paper, we introduce a drastically different strategy: It relies on the intuition that the network will produce spurious labels in regions depicting unexpected objects. Therefore, resynthesizing the image from the resulting semantic map will yield significant appearance differences with respect to the input image. In other words, we translate the problem of detecting unknown classes to one of identifying poorly-resynthesized image regions. We show that this outperforms both uncertainty- and autoencoder-based methods.

Via

Access Paper or Ask Questions

Neural Scene Decomposition for Multi-Person Motion Capture

Mar 13, 2019

Helge Rhodin, Victor Constantin, Isinsu Katircioglu, Mathieu Salzmann, Pascal Fua

Figure 1 for Neural Scene Decomposition for Multi-Person Motion Capture

Figure 2 for Neural Scene Decomposition for Multi-Person Motion Capture

Figure 3 for Neural Scene Decomposition for Multi-Person Motion Capture

Figure 4 for Neural Scene Decomposition for Multi-Person Motion Capture

Abstract:Learning general image representations has proven key to the success of many computer vision tasks. For example, many approaches to image understanding problems rely on deep networks that were initially trained on ImageNet, mostly because the learned features are a valuable starting point to learn from limited labeled data. However, when it comes to 3D motion capture of multiple people, these features are only of limited use. In this paper, we therefore propose an approach to learning features that are useful for this purpose. To this end, we introduce a self-supervised approach to learning what we call a neural scene decomposition (NSD) that can be exploited for 3D pose estimation. NSD comprises three layers of abstraction to represent human subjects: spatial layout in terms of bounding-boxes and relative depth; a 2D shape representation in terms of an instance segmentation mask; and subject-specific appearance and 3D pose information. By exploiting self-supervision coming from multiview data, our NSD model can be trained end-to-end without any 2D or 3D supervision. In contrast to previous approaches, it works for multiple persons and full-frame images. Because it encodes 3D geometry, NSD can then be effectively leveraged to train a 3D pose estimation network from small amounts of annotated data.

* CVPR 2019

Via

Access Paper or Ask Questions

Overcoming Multi-Model Forgetting

Mar 02, 2019

Yassine Benyahia, Kaicheng Yu, Kamil Bennani-Smires, Martin Jaggi, Anthony Davison, Mathieu Salzmann, Claudiu Musat

Figure 1 for Overcoming Multi-Model Forgetting

Figure 2 for Overcoming Multi-Model Forgetting

Figure 3 for Overcoming Multi-Model Forgetting

Figure 4 for Overcoming Multi-Model Forgetting

Abstract:We identify a phenomenon, which we refer to as multi-model forgetting, that occurs when sequentially training multiple deep networks with partially-shared parameters; the performance of previously-trained models degrades as one optimizes a subsequent one, due to the overwriting of shared parameters. To overcome this, we introduce a statistically-justified weight plasticity loss that regularizes the learning of a model's shared parameters according to their importance for the previous models, and demonstrate its effectiveness when training two models sequentially and for neural architecture search. Adding weight plasticity in neural architecture search preserves the best models to the end of the search and yields improved results in both natural language processing and computer vision tasks.

Via

Access Paper or Ask Questions

Evaluating the Search Phase of Neural Architecture Search

Feb 21, 2019

Christian Sciuto, Kaicheng Yu, Martin Jaggi, Claudiu Musat, Mathieu Salzmann

Figure 1 for Evaluating the Search Phase of Neural Architecture Search

Figure 2 for Evaluating the Search Phase of Neural Architecture Search

Figure 3 for Evaluating the Search Phase of Neural Architecture Search

Figure 4 for Evaluating the Search Phase of Neural Architecture Search

Abstract:Neural Architecture Search (NAS) aims to facilitate the design of deep networks for new tasks. Existing techniques rely on two stages: searching over the architecture space and validating the best architecture. Evaluating NAS algorithms is currently solely done by comparing their results on the downstream task. While intuitive, this fails to explicitly evaluate the effectiveness of their search strategies. In this paper, we extend the NAS evaluation procedure to include the search phase. To this end, we compare the quality of the solutions obtained by NAS search policies with that of random architecture selection. We find that: (i) On average, the random policy outperforms state-of-the-art NAS algorithms; and (ii) The results and candidate rankings of NAS algorithms do not reflect the true performance of the candidate architectures. While our former finding illustrates the fact that the NAS search space has been sufficiently constrained so that random solutions yield good results, we trace the latter back to the weight sharing strategy used by state-of-the-art NAS methods. In contrast with common belief, weight sharing negatively impacts the training of good architectures, thus reducing the effectiveness of the search process. We believe that following our evaluation framework will be key to designing NAS strategies that truly discover superior architectures.

* We find that random policy in NAS works amazingly well and propose an evaluation framework to have a fair comparison. 8 pages

Via

Access Paper or Ask Questions

Interpretable BoW Networks for Adversarial Example Detection

Jan 08, 2019

Krishna Kanth Nakka, Mathieu Salzmann

Figure 1 for Interpretable BoW Networks for Adversarial Example Detection

Figure 2 for Interpretable BoW Networks for Adversarial Example Detection

Figure 3 for Interpretable BoW Networks for Adversarial Example Detection

Figure 4 for Interpretable BoW Networks for Adversarial Example Detection

Abstract:The standard approach to providing interpretability to deep convolutional neural networks (CNNs) consists of visualizing either their feature maps, or the image regions that contribute the most to the prediction. In this paper, we introduce an alternative strategy to interpret the results of a CNN. To this end, we leverage a Bag of visual Word representation within the network and associate a visual and semantic meaning to the corresponding codebook elements via the use of a generative adversarial network. The reason behind the prediction for a new sample can then be interpreted by looking at the visual representation of the most highly activated codeword. We then propose to exploit our interpretable BoW networks for adversarial example detection. To this end, we build upon the intuition that, while adversarial samples look very similar to real images, to produce incorrect predictions, they should activate codewords with a significantly different visual representation. We therefore cast the adversarial example detection problem as that of comparing the input image with the most highly activated visual codeword. As evidenced by our experiments, this allows us to outperform the state-of-the-art adversarial example detection methods on standard benchmarks, independently of the attack strategy.

Via

Access Paper or Ask Questions

Segmentation-driven 6D Object Pose Estimation

Jan 08, 2019

Yinlin Hu, Joachim Hugonot, Pascal Fua, Mathieu Salzmann

Figure 1 for Segmentation-driven 6D Object Pose Estimation

Figure 2 for Segmentation-driven 6D Object Pose Estimation

Figure 3 for Segmentation-driven 6D Object Pose Estimation

Figure 4 for Segmentation-driven 6D Object Pose Estimation

Abstract:The most recent trend in estimating the 6D pose of rigid objects has been to train deep networks to either directly regress the pose from the image or to predict the 2D locations of 3D keypoints, from which the pose can be obtained using a PnP algorithm. In both cases, the object is treated as a global entity, and a single pose estimate is computed. As a consequence, the resulting techniques can be vulnerable to large occlusions. In this paper, we introduce a segmentation-driven 6D pose estimation framework where each visible part of the objects contributes a local pose prediction in the form of 2D keypoint locations. We then use a predicted measure of confidence to combine these pose candidates into a robust set of 3D-to-2D correspondences, from which a reliable pose estimate can be obtained. We outperform the state-of-the-art on the challenging Occluded-LINEMOD and YCB-Video datasets, which is evidence that our approach deals well with multiple poorly-textured objects occluding each other. Furthermore, it relies on a simple enough architecture to achieve real-time performance.

Via

Access Paper or Ask Questions