Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Julien Mairal

LJK

Recurrent Kernel Networks

Jun 07, 2019

Dexiong Chen, Laurent Jacob, Julien Mairal

Abstract:Substring kernels are classical tools for representing biological sequences or text. However, when large amounts of annotated data is available, models that allow end-to-end training such as neural networks are often prefered. Links between recurrent neural networks (RNNs) and substring kernels have recently been drawn, by formally showing that RNNs with specific activation functions were points in a reproducing kernel Hilbert space (RKHS). In this paper, we revisit this link by generalizing convolutional kernel networks---originally related to a relaxation of the mismatch kernel---to model gaps in sequences. It results in a new type of recurrent neural network which can be trained end-to-end with backpropagation, or without supervision by using kernel approximation techniques. We experimentally show that our approach is well suited to biological sequences, where it outperforms existing methods for protein classification tasks.

Via

Access Paper or Ask Questions

A Generic Acceleration Framework for Stochastic Composite Optimization

Jun 03, 2019

Andrei Kulunchakov, Julien Mairal

Figure 1 for A Generic Acceleration Framework for Stochastic Composite Optimization

Figure 2 for A Generic Acceleration Framework for Stochastic Composite Optimization

Figure 3 for A Generic Acceleration Framework for Stochastic Composite Optimization

Figure 4 for A Generic Acceleration Framework for Stochastic Composite Optimization

Abstract:In this paper, we introduce various mechanisms to obtain accelerated first-order stochastic optimization algorithms when the objective function is convex or strongly convex. Specifically, we extend the Catalyst approach originally designed for deterministic objectives to the stochastic setting. Given an optimization method with mild convergence guarantees for strongly convex problems, the challenge is to accelerate convergence to a noise-dominated region, and then achieve convergence with an optimal worst-case complexity depending on the noise variance of the gradients. A side contribution of our work is also a generic analysis that can handle inexact proximal operators, providing new insights about the robustness of stochastic algorithms when the proximal operator cannot be exactly computed.

Via

Access Paper or Ask Questions

On the Inductive Bias of Neural Tangent Kernels

May 29, 2019

Alberto Bietti, Julien Mairal

Abstract:State-of-the-art neural networks are heavily over-parameterized, making the optimization algorithm a crucial ingredient for learning predictive models with good generalization properties. A recent line of work has shown that in a certain over-parameterized regime, the learning dynamics of gradient descent are governed by a certain kernel obtained at initialization, called the neural tangent kernel. We study the inductive bias of learning in such a regime by analyzing this kernel and the corresponding function space (RKHS). In particular, we study smoothness, approximation, and stability properties of functions with finite norm, including stability to image deformations in the case of convolutional networks.

Via

Access Paper or Ask Questions

Estimate Sequences for Variance-Reduced Stochastic Composite Optimization

May 07, 2019

Andrei Kulunchakov, Julien Mairal

Figure 1 for Estimate Sequences for Variance-Reduced Stochastic Composite Optimization

Abstract:In this paper, we propose a unified view of gradient-based algorithms for stochastic convex composite optimization by extending the concept of estimate sequence introduced by Nesterov. This point of view covers the stochastic gradient descent method, variants of the approaches SAGA, SVRG, and has several advantages: (i) we provide a generic proof of convergence for the aforementioned methods; (ii) we show that this SVRG variant is adaptive to strong convexity; (iii) we naturally obtain new algorithms with the same guarantees; (iv) we derive generic strategies to make these algorithms robust to stochastic noise, which is useful when data is corrupted by small random perturbations. Finally, we show that this viewpoint is useful to obtain new accelerated algorithms in the sense of Nesterov.

* International Conference on Machine Learning (ICML), Jun 2019, Long Beach, United States
* short version of preprint arXiv:1901.08788

Via

Access Paper or Ask Questions

Leveraging Large-Scale Uncurated Data for Unsupervised Pre-training of Visual Features

May 03, 2019

Mathilde Caron, Piotr Bojanowski, Julien Mairal, Armand Joulin

Figure 1 for Leveraging Large-Scale Uncurated Data for Unsupervised Pre-training of Visual Features

Figure 2 for Leveraging Large-Scale Uncurated Data for Unsupervised Pre-training of Visual Features

Figure 3 for Leveraging Large-Scale Uncurated Data for Unsupervised Pre-training of Visual Features

Figure 4 for Leveraging Large-Scale Uncurated Data for Unsupervised Pre-training of Visual Features

Abstract:Pre-training general-purpose visual features with convolutional neural networks without relying on annotations is a challenging and important task. Most recent efforts in unsupervised feature learning have focused on either small or highly curated datasets like ImageNet, whereas using uncurated raw datasets was found to decrease the feature quality when evaluated on a transfer task. Our goal is to bridge the performance gap between unsupervised methods trained on curated data, which are costly to obtain, and massive raw datasets that are easily available. To that effect, we propose a new unsupervised approach which leverages self-supervision and clustering to capture complementary statistics from large-scale data. We validate our approach on 96 million images from YFCC100M, achieving state-of-the-art results among unsupervised methods on standard benchmarks, which confirms the potential of unsupervised learning when only uncurated data are available. We also show that pre-training a supervised VGG-16 with our method achieves 74.6% top-1 accuracy on the validation set of ImageNet classification, which is an improvement of +0.7% over the same network trained from scratch.

Via

Access Paper or Ask Questions

Diversity with Cooperation: Ensemble Methods for Few-Shot Classification

Mar 27, 2019

Nikita Dvornik, Cordelia Schmid, Julien Mairal

Figure 1 for Diversity with Cooperation: Ensemble Methods for Few-Shot Classification

Figure 2 for Diversity with Cooperation: Ensemble Methods for Few-Shot Classification

Figure 3 for Diversity with Cooperation: Ensemble Methods for Few-Shot Classification

Figure 4 for Diversity with Cooperation: Ensemble Methods for Few-Shot Classification

Abstract:Few-shot classification consists of learning a predictive model that is able to effectively adapt to a new class, given only a few annotated samples. To solve this challenging problem, meta-learning has become a popular paradigm that advocates the ability to "learn to adapt". Recent works have shown, however, that simple learning strategies without meta-learning could be competitive. In this paper, we go a step further and show that by addressing the fundamental high-variance issue of few-shot learning classifiers, it is possible to significantly outperform current meta-learning techniques. Our approach consists of designing an ensemble of deep networks to leverage the variance of the classifiers, and introducing new strategies to encourage the networks to cooperate, while encouraging prediction diversity. Evaluation is conducted on the mini-ImageNet and CUB datasets, where we show that even a single network obtained by distillation yields state-of-the-art results.

Via

Access Paper or Ask Questions

Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise

Jan 25, 2019

Andrei Kulunchakov, Julien Mairal

Figure 1 for Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise

Figure 2 for Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise

Figure 3 for Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise

Figure 4 for Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise

Abstract:In this paper, we propose a unified view of gradient-based algorithms for stochastic convex composite optimization. By extending the concept of estimate sequence introduced by Nesterov, we interpret a large class of stochastic optimization methods as procedures that iteratively minimize a surrogate of the objective. This point of view covers stochastic gradient descent (SGD), the variance-reduction approaches SAGA, SVRG, MISO, their proximal variants, and has several advantages: (i) we provide a simple generic proof of convergence for all of the aforementioned methods; (ii) we naturally obtain new algorithms with the same guarantees; (iii) we derive generic strategies to make these algorithms robust to stochastic noise, which is useful when data is corrupted by small random perturbations. Finally, we show that this viewpoint is useful to obtain accelerated algorithms.

Via

Access Paper or Ask Questions

Extracting Universal Representations of Cognition across Brain-Imaging Studies

Oct 19, 2018

Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux

Figure 1 for Extracting Universal Representations of Cognition across Brain-Imaging Studies

Figure 2 for Extracting Universal Representations of Cognition across Brain-Imaging Studies

Figure 3 for Extracting Universal Representations of Cognition across Brain-Imaging Studies

Figure 4 for Extracting Universal Representations of Cognition across Brain-Imaging Studies

Abstract:We show in this paper how to extract shared brain representations that predict mental processes across many cognitive neuroimaging studies. Focused cognitive-neuroimaging experiments study precise mental processes with carefully-designed cognitive paradigms; however the cost of imaging limits their statistical power. On the other hand, large-scale databasing efforts increase considerably the sample sizes, but cannot ask precise cognitive questions. To address this tension, we develop new methods that turn the heterogeneous cognitive information held in different task-fMRI studies into common-universal-cognitive models. Our approach does not assume any prior knowledge of the commonalities shared by the studies in the corpus; those are inferred during model training. The method uses deep-learning techniques to extract representations - task-optimized networks - that form a set of basis cognitive dimensions relevant to the psychological manipulations. In this sense, it forms a novel kind of functional atlas, optimized to capture mental state across many functional-imaging experiments. As it bridges information on the neural support of mental processes, this representation improves decoding performance for 80% of the 35 widely-different functional imaging studies that we consider. Our approach opens new ways of extracting information from brain maps, increasing statistical power even for focused cognitive neuroimaging studies, in particular for those with few subjects.

Via

Access Paper or Ask Questions

Group Invariance, Stability to Deformations, and Complexity of Deep Convolutional Representations

Oct 10, 2018

Alberto Bietti, Julien Mairal

Figure 1 for Group Invariance, Stability to Deformations, and Complexity of Deep Convolutional Representations

Figure 2 for Group Invariance, Stability to Deformations, and Complexity of Deep Convolutional Representations

Figure 3 for Group Invariance, Stability to Deformations, and Complexity of Deep Convolutional Representations

Figure 4 for Group Invariance, Stability to Deformations, and Complexity of Deep Convolutional Representations

Abstract:The success of deep convolutional architectures is often attributed in part to their ability to learn multiscale and invariant representations of natural signals. However, a precise study of these properties and how they affect learning guarantees is still missing. In this paper, we consider deep convolutional representations of signals; we study their invariance to translations and to more general groups of transformations, their stability to the action of diffeomorphisms, and their ability to preserve signal information. This analysis is carried by introducing a multilayer kernel based on convolutional kernel networks and by studying the geometry induced by the kernel mapping. We then characterize the corresponding reproducing kernel Hilbert space (RKHS), showing that it contains a large class of convolutional neural networks with homogeneous activation functions. This analysis allows us to separate data representation from learning, and to provide a canonical measure of model complexity, the RKHS norm, which controls both stability and generalization of any learned model. In addition to models in the constructed RKHS, our stability analysis also applies to convolutional networks with generic activations such as rectified linear units, and we discuss its relationship with recent generalization bounds based on spectral norms.

Via

Access Paper or Ask Questions

Unsupervised Learning of Artistic Styles with Archetypal Style Analysis

Oct 02, 2018

Daan Wynen, Cordelia Schmid, Julien Mairal

Figure 1 for Unsupervised Learning of Artistic Styles with Archetypal Style Analysis

Figure 2 for Unsupervised Learning of Artistic Styles with Archetypal Style Analysis

Figure 3 for Unsupervised Learning of Artistic Styles with Archetypal Style Analysis

Figure 4 for Unsupervised Learning of Artistic Styles with Archetypal Style Analysis

Abstract:In this paper, we introduce an unsupervised learning approach to automatically discover, summarize, and manipulate artistic styles from large collections of paintings. Our method is based on archetypal analysis, which is an unsupervised learning technique akin to sparse coding with a geometric interpretation. When applied to deep image representations from a collection of artworks, it learns a dictionary of archetypal styles, which can be easily visualized. After training the model, the style of a new image, which is characterized by local statistics of deep visual features, is approximated by a sparse convex combination of archetypes. This enables us to interpret which archetypal styles are present in the input image, and in which proportion. Finally, our approach allows us to manipulate the coefficients of the latent archetypal decomposition, and achieve various special effects such as style enhancement, transfer, and interpolation between multiple archetypes.

* Accepted at NIPS 2018, Montr\'eal, Canada

Via

Access Paper or Ask Questions