Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Julien Mairal

LJK

Efficient Kernel UCB for Contextual Bandits

Feb 11, 2022

Houssam Zenati, Alberto Bietti, Eustache Diemert, Julien Mairal, Matthieu Martin, Pierre Gaillard

Figure 1 for Efficient Kernel UCB for Contextual Bandits

Figure 2 for Efficient Kernel UCB for Contextual Bandits

Figure 3 for Efficient Kernel UCB for Contextual Bandits

Figure 4 for Efficient Kernel UCB for Contextual Bandits

Abstract:In this paper, we tackle the computational efficiency of kernelized UCB algorithms in contextual bandits. While standard methods require a O(CT^3) complexity where T is the horizon and the constant C is related to optimizing the UCB rule, we propose an efficient contextual algorithm for large-scale problems. Specifically, our method relies on incremental Nystrom approximations of the joint kernel embedding of contexts and actions. This allows us to achieve a complexity of O(CTm^2) where m is the number of Nystrom points. To recover the same regret as the standard kernelized UCB algorithm, m needs to be of order of the effective dimension of the problem, which is at most O(\sqrt(T)) and nearly constant in some cases.

* To appear at AISTATS2022

Via

Access Paper or Ask Questions

Self-Supervised Models are Continual Learners

Dec 08, 2021

Enrico Fini, Victor G. Turrisi da Costa, Xavier Alameda-Pineda, Elisa Ricci, Karteek Alahari, Julien Mairal

Figure 1 for Self-Supervised Models are Continual Learners

Figure 2 for Self-Supervised Models are Continual Learners

Figure 3 for Self-Supervised Models are Continual Learners

Figure 4 for Self-Supervised Models are Continual Learners

Abstract:Self-supervised models have been shown to produce comparable or better visual representations than their supervised counterparts when trained offline on unlabeled data at scale. However, their efficacy is catastrophically reduced in a Continual Learning (CL) scenario where data is presented to the model sequentially. In this paper, we show that self-supervised loss functions can be seamlessly converted into distillation mechanisms for CL by adding a predictor network that maps the current state of the representations to their past state. This enables us to devise a framework for Continual self-supervised visual representation Learning that (i) significantly improves the quality of the learned representations, (ii) is compatible with several state-of-the-art self-supervised objectives, and (iii) needs little to no hyperparameter tuning. We demonstrate the effectiveness of our approach empirically by training six popular self-supervised models in various CL settings.

Via

Access Paper or Ask Questions

Amortized Implicit Differentiation for Stochastic Bilevel Optimization

Nov 30, 2021

Michael Arbel, Julien Mairal

Figure 1 for Amortized Implicit Differentiation for Stochastic Bilevel Optimization

Figure 2 for Amortized Implicit Differentiation for Stochastic Bilevel Optimization

Figure 3 for Amortized Implicit Differentiation for Stochastic Bilevel Optimization

Figure 4 for Amortized Implicit Differentiation for Stochastic Bilevel Optimization

Abstract:We study a class of algorithms for solving bilevel optimization problems in both stochastic and deterministic settings when the inner-level objective is strongly convex. Specifically, we consider algorithms based on inexact implicit differentiation and we exploit a warm-start strategy to amortize the estimation of the exact gradient. We then introduce a unified theoretical framework inspired by the study of singularly perturbed systems (Habets, 1974) to analyze such amortized algorithms. By using this framework, our analysis shows these algorithms to match the computational complexity of oracle methods that have access to an unbiased estimate of the gradient, thus outperforming many existing results for bilevel optimization. We illustrate these findings on synthetic experiments and demonstrate the efficiency of these algorithms on hyper-parameter optimization experiments involving several thousands of variables.

Via

Access Paper or Ask Questions

A Trainable Spectral-Spatial Sparse Coding Model for Hyperspectral Image Restoration

Nov 18, 2021

Théo Bodrito, Alexandre Zouaoui, Jocelyn Chanussot, Julien Mairal

Figure 1 for A Trainable Spectral-Spatial Sparse Coding Model for Hyperspectral Image Restoration

Figure 2 for A Trainable Spectral-Spatial Sparse Coding Model for Hyperspectral Image Restoration

Figure 3 for A Trainable Spectral-Spatial Sparse Coding Model for Hyperspectral Image Restoration

Figure 4 for A Trainable Spectral-Spatial Sparse Coding Model for Hyperspectral Image Restoration

Abstract:Hyperspectral imaging offers new perspectives for diverse applications, ranging from the monitoring of the environment using airborne or satellite remote sensing, precision farming, food safety, planetary exploration, or astrophysics. Unfortunately, the spectral diversity of information comes at the expense of various sources of degradation, and the lack of accurate ground-truth "clean" hyperspectral signals acquired on the spot makes restoration tasks challenging. In particular, training deep neural networks for restoration is difficult, in contrast to traditional RGB imaging problems where deep models tend to shine. In this paper, we advocate instead for a hybrid approach based on sparse coding principles that retains the interpretability of classical techniques encoding domain knowledge with handcrafted image priors, while allowing to train model parameters end-to-end without massive amounts of data. We show on various denoising benchmarks that our method is computationally efficient and significantly outperforms the state of the art.

* 2021 Conference on Neural Information Processing Systems, Dec 2021, Sydney, Australia

Via

Access Paper or Ask Questions

Beyond Tikhonov: Faster Learning with Self-Concordant Losses via Iterative Regularization

Jun 16, 2021

Gaspard Beugnot, Julien Mairal, Alessandro Rudi

Figure 1 for Beyond Tikhonov: Faster Learning with Self-Concordant Losses via Iterative Regularization

Figure 2 for Beyond Tikhonov: Faster Learning with Self-Concordant Losses via Iterative Regularization

Figure 3 for Beyond Tikhonov: Faster Learning with Self-Concordant Losses via Iterative Regularization

Figure 4 for Beyond Tikhonov: Faster Learning with Self-Concordant Losses via Iterative Regularization

Abstract:The theory of spectral filtering is a remarkable tool to understand the statistical properties of learning with kernels. For least squares, it allows to derive various regularization schemes that yield faster convergence rates of the excess risk than with Tikhonov regularization. This is typically achieved by leveraging classical assumptions called source and capacity conditions, which characterize the difficulty of the learning task. In order to understand estimators derived from other loss functions, Marteau-Ferey et al. have extended the theory of Tikhonov regularization to generalized self concordant loss functions (GSC), which contain, e.g., the logistic loss. In this paper, we go a step further and show that fast and optimal rates can be achieved for GSC by using the iterated Tikhonov regularization scheme, which is intrinsically related to the proximal point method in optimization, and overcomes the limitation of the classical Tikhonov regularization.

Via

Access Paper or Ask Questions

Residual Reinforcement Learning from Demonstrations

Jun 15, 2021

Minttu Alakuijala, Gabriel Dulac-Arnold, Julien Mairal, Jean Ponce, Cordelia Schmid

Figure 1 for Residual Reinforcement Learning from Demonstrations

Figure 2 for Residual Reinforcement Learning from Demonstrations

Figure 3 for Residual Reinforcement Learning from Demonstrations

Figure 4 for Residual Reinforcement Learning from Demonstrations

Abstract:Residual reinforcement learning (RL) has been proposed as a way to solve challenging robotic tasks by adapting control actions from a conventional feedback controller to maximize a reward signal. We extend the residual formulation to learn from visual inputs and sparse rewards using demonstrations. Learning from images, proprioceptive inputs and a sparse task-completion reward relaxes the requirement of accessing full state features, such as object and target positions. In addition, replacing the base controller with a policy learned from demonstrations removes the dependency on a hand-engineered controller in favour of a dataset of demonstrations, which can be provided by non-experts. Our experimental evaluation on simulated manipulation tasks on a 6-DoF UR5 arm and a 28-DoF dexterous hand demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning, and is capable of solving high-dimensional, sparse-reward tasks out of reach for RL from scratch.

Via

Access Paper or Ask Questions

GraphiT: Encoding Graph Structure in Transformers

Jun 10, 2021

Grégoire Mialon, Dexiong Chen, Margot Selosse, Julien Mairal

Figure 1 for GraphiT: Encoding Graph Structure in Transformers

Figure 2 for GraphiT: Encoding Graph Structure in Transformers

Figure 3 for GraphiT: Encoding Graph Structure in Transformers

Figure 4 for GraphiT: Encoding Graph Structure in Transformers

Abstract:We show that viewing graphs as sets of node features and incorporating structural and positional information into a transformer architecture is able to outperform representations learned with classical graph neural networks (GNNs). Our model, GraphiT, encodes such information by (i) leveraging relative positional encoding strategies in self-attention scores based on positive definite kernels on graphs, and (ii) enumerating and encoding local sub-structures such as paths of short length. We thoroughly evaluate these two ideas on many classification and regression tasks, demonstrating the effectiveness of each of them independently, as well as their combination. In addition to performing well on standard benchmarks, our model also admits natural visualization mechanisms for interpreting graph motifs explaining the predictions, making it a potentially strong candidate for scientific applications where interpretation is important. Code available at https://github.com/inria-thoth/GraphiT.

Via

Access Paper or Ask Questions

NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results

Jun 07, 2021

Goutam Bhat, Martin Danelljan, Radu Timofte, Kazutoshi Akita, Wooyeong Cho, Haoqiang Fan, Lanpeng Jia, Daeshik Kim, Bruno Lecouat, Youwei Li(+18 more)

Figure 1 for NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results

Figure 2 for NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results

Figure 3 for NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results

Figure 4 for NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results

Abstract:This paper reviews the NTIRE2021 challenge on burst super-resolution. Given a RAW noisy burst as input, the task in the challenge was to generate a clean RGB image with 4 times higher resolution. The challenge contained two tracks; Track 1 evaluating on synthetically generated data, and Track 2 using real-world bursts from mobile camera. In the final testing phase, 6 teams submitted results using a diverse set of solutions. The top-performing methods set a new state-of-the-art for the burst super-resolution task.

* NTIRE 2021 Burst Super-Resolution challenge report

Via

Access Paper or Ask Questions

Emerging Properties in Self-Supervised Vision Transformers

May 24, 2021

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin

Figure 1 for Emerging Properties in Self-Supervised Vision Transformers

Figure 2 for Emerging Properties in Self-Supervised Vision Transformers

Figure 3 for Emerging Properties in Self-Supervised Vision Transformers

Figure 4 for Emerging Properties in Self-Supervised Vision Transformers

Abstract:In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the fact that adapting self-supervised methods to this architecture works particularly well, we make the following observations: first, self-supervised ViT features contain explicit information about the semantic segmentation of an image, which does not emerge as clearly with supervised ViTs, nor with convnets. Second, these features are also excellent k-NN classifiers, reaching 78.3% top-1 on ImageNet with a small ViT. Our study also underlines the importance of momentum encoder, multi-crop training, and the use of small patches with ViTs. We implement our findings into a simple self-supervised method, called DINO, which we interpret as a form of self-distillation with no labels. We show the synergy between DINO and ViTs by achieving 80.1% top-1 on ImageNet in linear evaluation with ViT-Base.

* 21 pages

Via

Access Paper or Ask Questions

Aliasing is your Ally: End-to-End Super-Resolution from Raw Image Bursts

Apr 13, 2021

Bruno Lecouat, Jean Ponce, Julien Mairal

Figure 1 for Aliasing is your Ally: End-to-End Super-Resolution from Raw Image Bursts

Figure 2 for Aliasing is your Ally: End-to-End Super-Resolution from Raw Image Bursts

Figure 3 for Aliasing is your Ally: End-to-End Super-Resolution from Raw Image Bursts

Figure 4 for Aliasing is your Ally: End-to-End Super-Resolution from Raw Image Bursts

Abstract:This presentation addresses the problem of reconstructing a high-resolution image from multiple lower-resolution snapshots captured from slightly different viewpoints in space and time. Key challenges for solving this problem include (i) aligning the input pictures with sub-pixel accuracy, (ii) handling raw (noisy) images for maximal faithfulness to native camera data, and (iii) designing/learning an image prior (regularizer) well suited to the task. We address these three challenges with a hybrid algorithm building on the insight from Wronski et al. that aliasing is an ally in this setting, with parameters that can be learned end to end, while retaining the interpretability of classical approaches to inverse problems. The effectiveness of our approach is demonstrated on synthetic and real image bursts, setting a new state of the art on several benchmarks and delivering excellent qualitative results on real raw bursts captured by smartphones and prosumer cameras.

Via

Access Paper or Ask Questions