Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Julien Mairal

LJK

Semi-supervised learning made simple with self-supervised clustering

Jun 13, 2023

Enrico Fini, Pietro Astolfi, Karteek Alahari, Xavier Alameda-Pineda, Julien Mairal, Moin Nabi, Elisa Ricci

Figure 1 for Semi-supervised learning made simple with self-supervised clustering

Figure 2 for Semi-supervised learning made simple with self-supervised clustering

Figure 3 for Semi-supervised learning made simple with self-supervised clustering

Figure 4 for Semi-supervised learning made simple with self-supervised clustering

Abstract:Self-supervised learning models have been shown to learn rich visual representations without requiring human annotations. However, in many real-world scenarios, labels are partially available, motivating a recent line of work on semi-supervised methods inspired by self-supervised principles. In this paper, we propose a conceptually simple yet empirically powerful approach to turn clustering-based self-supervised methods such as SwAV or DINO into semi-supervised learners. More precisely, we introduce a multi-task framework merging a supervised objective using ground-truth labels and a self-supervised objective relying on clustering assignments with a single cross-entropy loss. This approach may be interpreted as imposing the cluster centroids to be class prototypes. Despite its simplicity, we provide empirical evidence that our approach is highly effective and achieves state-of-the-art performance on CIFAR100 and ImageNet.

* Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) 3187-3197
* CVPR 2023 - Code available at https://github.com/pietroastolfi/suave-daino

Via

Access Paper or Ask Questions

Self-Attention in Colors: Another Take on Encoding Graph Structure in Transformers

Apr 21, 2023

Romain Menegaux, Emmanuel Jehanno, Margot Selosse, Julien Mairal

Figure 1 for Self-Attention in Colors: Another Take on Encoding Graph Structure in Transformers

Figure 2 for Self-Attention in Colors: Another Take on Encoding Graph Structure in Transformers

Figure 3 for Self-Attention in Colors: Another Take on Encoding Graph Structure in Transformers

Figure 4 for Self-Attention in Colors: Another Take on Encoding Graph Structure in Transformers

Abstract:We introduce a novel self-attention mechanism, which we call CSA (Chromatic Self-Attention), which extends the notion of attention scores to attention _filters_, independently modulating the feature channels. We showcase CSA in a fully-attentional graph Transformer CGT (Chromatic Graph Transformer) which integrates both graph structural information and edge features, completely bypassing the need for local message-passing components. Our method flexibly encodes graph structure through node-node interactions, by enriching the original edge features with a relative positional encoding scheme. We propose a new scheme based on random walks that encodes both structural and positional information, and show how to incorporate higher-order topological information, such as rings in molecular graphs. Our approach achieves state-of-the-art results on the ZINC benchmark dataset, while providing a flexible framework for encoding graph structure and incorporating higher-order topology.

Via

Access Paper or Ask Questions

DINOv2: Learning Robust Visual Features without Supervision

Apr 14, 2023

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby(+16 more)

Figure 1 for DINOv2: Learning Robust Visual Features without Supervision

Figure 2 for DINOv2: Learning Robust Visual Features without Supervision

Figure 3 for DINOv2: Learning Robust Visual Features without Supervision

Figure 4 for DINOv2: Learning Robust Visual Features without Supervision

Abstract:The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources. We revisit existing approaches and combine different techniques to scale our pretraining in terms of data and model size. Most of the technical contributions aim at accelerating and stabilizing the training at scale. In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature. In terms of models, we train a ViT model (Dosovitskiy et al., 2020) with 1B parameters and distill it into a series of smaller models that surpass the best available all-purpose features, OpenCLIP (Ilharco et al., 2021) on most of the benchmarks at image and pixel levels.

Via

Access Paper or Ask Questions

Sequential Counterfactual Risk Minimization

Feb 23, 2023

Houssam Zenati, Eustache Diemert, Matthieu Martin, Julien Mairal, Pierre Gaillard

Abstract:Counterfactual Risk Minimization (CRM) is a framework for dealing with the logged bandit feedback problem, where the goal is to improve a logging policy using offline data. In this paper, we explore the case where it is possible to deploy learned policies multiple times and acquire new data. We extend the CRM principle and its theory to this scenario, which we call "Sequential Counterfactual Risk Minimization (SCRM)." We introduce a novel counterfactual estimator and identify conditions that can improve the performance of CRM in terms of excess risk and regret rates, by using an analysis similar to restart strategies in accelerated optimization methods. We also provide an empirical evaluation of our method in both discrete and continuous action settings, and demonstrate the benefits of multiple deployments of CRM.

Via

Access Paper or Ask Questions

Learning Reward Functions for Robotic Manipulation by Observing Humans

Nov 16, 2022

Minttu Alakuijala, Gabriel Dulac-Arnold, Julien Mairal, Jean Ponce, Cordelia Schmid

Abstract:Observing a human demonstrator manipulate objects provides a rich, scalable and inexpensive source of data for learning robotic policies. However, transferring skills from human videos to a robotic manipulator poses several challenges, not least a difference in action and observation spaces. In this work, we use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies. Thanks to the diversity of this training data, the learned reward function sufficiently generalizes to image observations from a previously unseen robot embodiment and environment to provide a meaningful prior for directed exploration in reinforcement learning. The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective. By conditioning the function on a goal image, we are able to reuse one model across a variety of tasks. Unlike prior work on leveraging human videos to teach robots, our method, Human Offline Learned Distances (HOLD) requires neither a priori data from the robot environment, nor a set of task-specific human demonstrations, nor a predefined notion of correspondence across morphologies, yet it is able to accelerate training of several manipulation tasks on a simulated robot arm compared to using only a sparse reward obtained from task completion.

Via

Access Paper or Ask Questions

Entropic Descent Archetypal Analysis for Blind Hyperspectral Unmixing

Sep 26, 2022

Alexandre Zouaoui, Gedeon Muhawenayo, Behnood Rasti, Jocelyn Chanussot, Julien Mairal

Figure 1 for Entropic Descent Archetypal Analysis for Blind Hyperspectral Unmixing

Figure 2 for Entropic Descent Archetypal Analysis for Blind Hyperspectral Unmixing

Figure 3 for Entropic Descent Archetypal Analysis for Blind Hyperspectral Unmixing

Figure 4 for Entropic Descent Archetypal Analysis for Blind Hyperspectral Unmixing

Abstract:In this paper, we introduce a new algorithm based on archetypal analysis for blind hyperspectral unmixing, assuming linear mixing of endmembers. Archetypal analysis is a natural formulation for this task. This method does not require the presence of pure pixels (i.e., pixels containing a single material) but instead represents endmembers as convex combinations of a few pixels present in the original hyperspectral image. Our approach leverages an entropic gradient descent strategy, which (i) provides better solutions for hyperspectral unmixing than traditional archetypal analysis algorithms, and (ii) leads to efficient GPU implementations. Since running a single instance of our algorithm is fast, we also propose an ensembling mechanism along with an appropriate model selection procedure that make our method robust to hyper-parameter choices while keeping the computational complexity reasonable. By using six standard real datasets, we show that our approach outperforms state-of-the-art matrix factorization and recent deep learning methods. We also provide an open-source PyTorch implementation: https://github.com/inria-thoth/EDAA.

Via

Access Paper or Ask Questions

High Dynamic Range and Super-Resolution from Raw Image Bursts

Jul 29, 2022

Bruno Lecouat, Thomas Eboli, Jean Ponce, Julien Mairal

Figure 1 for High Dynamic Range and Super-Resolution from Raw Image Bursts

Figure 2 for High Dynamic Range and Super-Resolution from Raw Image Bursts

Figure 3 for High Dynamic Range and Super-Resolution from Raw Image Bursts

Figure 4 for High Dynamic Range and Super-Resolution from Raw Image Bursts

Abstract:Photographs captured by smartphones and mid-range cameras have limited spatial resolution and dynamic range, with noisy response in underexposed regions and color artefacts in saturated areas. This paper introduces the first approach (to the best of our knowledge) to the reconstruction of high-resolution, high-dynamic range color images from raw photographic bursts captured by a handheld camera with exposure bracketing. This method uses a physically-accurate model of image formation to combine an iterative optimization algorithm for solving the corresponding inverse problem with a learned image representation for robust alignment and a learned natural image prior. The proposed algorithm is fast, with low memory requirements compared to state-of-the-art learning-based approaches to image restoration, and features that are learned end to end from synthetic yet realistic data. Extensive experiments demonstrate its excellent performance with super-resolution factors of up to $\times 4$ on real photographs taken in the wild with hand-held cameras, and high robustness to low-light conditions, noise, camera shake, and moderate object motion.

* Accepted to Siggraph 2022 Technical Papers program

Via

Access Paper or Ask Questions

Self Supervised Learning for Few Shot Hyperspectral Image Classification

Jun 24, 2022

Nassim Ait Ali Braham, Lichao Mou, Jocelyn Chanussot, Julien Mairal, Xiao Xiang Zhu

Figure 1 for Self Supervised Learning for Few Shot Hyperspectral Image Classification

Figure 2 for Self Supervised Learning for Few Shot Hyperspectral Image Classification

Figure 3 for Self Supervised Learning for Few Shot Hyperspectral Image Classification

Figure 4 for Self Supervised Learning for Few Shot Hyperspectral Image Classification

Abstract:Deep learning has proven to be a very effective approach for Hyperspectral Image (HSI) classification. However, deep neural networks require large annotated datasets to generalize well. This limits the applicability of deep learning for HSI classification, where manually labelling thousands of pixels for every scene is impractical. In this paper, we propose to leverage Self Supervised Learning (SSL) for HSI classification. We show that by pre-training an encoder on unlabeled pixels using Barlow-Twins, a state-of-the-art SSL algorithm, we can obtain accurate models with a handful of labels. Experimental results demonstrate that this approach significantly outperforms vanilla supervised learning.

* Accepted in IGARSS 2022

Via

Access Paper or Ask Questions

On the Benefits of Large Learning Rates for Kernel Methods

Feb 28, 2022

Gaspard Beugnot, Julien Mairal, Alessandro Rudi

Figure 1 for On the Benefits of Large Learning Rates for Kernel Methods

Figure 2 for On the Benefits of Large Learning Rates for Kernel Methods

Figure 3 for On the Benefits of Large Learning Rates for Kernel Methods

Figure 4 for On the Benefits of Large Learning Rates for Kernel Methods

Abstract:This paper studies an intriguing phenomenon related to the good generalization performance of estimators obtained by using large learning rates within gradient descent algorithms. First observed in the deep learning literature, we show that a phenomenon can be precisely characterized in the context of kernel methods, even though the resulting optimization problem is convex. Specifically, we consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution on the Hessian's eigenvectors. This extends an intuition described by Nakkiran (2020) on a two-dimensional toy problem to realistic learning scenarios such as kernel ridge regression. While large learning rates may be proven beneficial as soon as there is a mismatch between the train and test objectives, we further explain why it already occurs in classification tasks without assuming any particular mismatch between train and test data distributions.

* 23 pages, 5 figures

Via

Access Paper or Ask Questions

The Spectral Bias of Polynomial Neural Networks

Feb 27, 2022

Moulik Choraria, Leello Tadesse Dadi, Grigorios Chrysos, Julien Mairal, Volkan Cevher

Figure 1 for The Spectral Bias of Polynomial Neural Networks

Figure 2 for The Spectral Bias of Polynomial Neural Networks

Figure 3 for The Spectral Bias of Polynomial Neural Networks

Figure 4 for The Spectral Bias of Polynomial Neural Networks

Abstract:Polynomial neural networks (PNNs) have been recently shown to be particularly effective at image generation and face recognition, where high-frequency information is critical. Previous studies have revealed that neural networks demonstrate a $\textit{spectral bias}$ towards low-frequency functions, which yields faster learning of low-frequency components during training. Inspired by such studies, we conduct a spectral analysis of the Neural Tangent Kernel (NTK) of PNNs. We find that the $\Pi$-Net family, i.e., a recently proposed parametrization of PNNs, speeds up the learning of the higher frequencies. We verify the theoretical bias through extensive experiments. We expect our analysis to provide novel insights into designing architectures and learning frameworks by incorporating multiplicative interactions via polynomials.

* Accepted at the International Conference on Learning Representations(ICLR) 2022

Via

Access Paper or Ask Questions