Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jean Ponce

DI-ENS, CDS

VICRegL: Self-Supervised Learning of Local Visual Features

Oct 04, 2022

Adrien Bardes, Jean Ponce, Yann LeCun

Figure 1 for VICRegL: Self-Supervised Learning of Local Visual Features

Figure 2 for VICRegL: Self-Supervised Learning of Local Visual Features

Figure 3 for VICRegL: Self-Supervised Learning of Local Visual Features

Figure 4 for VICRegL: Self-Supervised Learning of Local Visual Features

Abstract:Most recent self-supervised methods for learning image representations focus on either producing a global feature with invariance properties, or producing a set of local features. The former works best for classification tasks while the latter is best for detection and segmentation tasks. This paper explores the fundamental trade-off between learning local and global features. A new method called VICRegL is proposed that learns good global and local features simultaneously, yielding excellent performance on detection and segmentation tasks while maintaining good performance on classification tasks. Concretely, two identical branches of a standard convolutional net architecture are fed two differently distorted versions of the same image. The VICReg criterion is applied to pairs of global feature vectors. Simultaneously, the VICReg criterion is applied to pairs of local feature vectors occurring before the last pooling layer. Two local feature vectors are attracted to each other if their l2-distance is below a threshold or if their relative locations are consistent with a known geometric transformation between the two input images. We demonstrate strong performance on linear classification and segmentation transfer tasks. Code and pretrained models are publicly available at: https://github.com/facebookresearch/VICRegL

* Accepted at NeurIPS 2022

Via

Access Paper or Ask Questions

High Dynamic Range and Super-Resolution from Raw Image Bursts

Jul 29, 2022

Bruno Lecouat, Thomas Eboli, Jean Ponce, Julien Mairal

Figure 1 for High Dynamic Range and Super-Resolution from Raw Image Bursts

Figure 2 for High Dynamic Range and Super-Resolution from Raw Image Bursts

Figure 3 for High Dynamic Range and Super-Resolution from Raw Image Bursts

Figure 4 for High Dynamic Range and Super-Resolution from Raw Image Bursts

Abstract:Photographs captured by smartphones and mid-range cameras have limited spatial resolution and dynamic range, with noisy response in underexposed regions and color artefacts in saturated areas. This paper introduces the first approach (to the best of our knowledge) to the reconstruction of high-resolution, high-dynamic range color images from raw photographic bursts captured by a handheld camera with exposure bracketing. This method uses a physically-accurate model of image formation to combine an iterative optimization algorithm for solving the corresponding inverse problem with a learned image representation for robust alignment and a learned natural image prior. The proposed algorithm is fast, with low memory requirements compared to state-of-the-art learning-based approaches to image restoration, and features that are learned end to end from synthetic yet realistic data. Extensive experiments demonstrate its excellent performance with super-resolution factors of up to $\times 4$ on real photographs taken in the wild with hand-held cameras, and high robustness to low-light conditions, noise, camera shake, and moderate object motion.

* Accepted to Siggraph 2022 Technical Papers program

Via

Access Paper or Ask Questions

Active Learning Strategies for Weakly-supervised Object Detection

Jul 25, 2022

Huy V. Vo, Oriane Siméoni, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Jean Ponce

Figure 1 for Active Learning Strategies for Weakly-supervised Object Detection

Figure 2 for Active Learning Strategies for Weakly-supervised Object Detection

Figure 3 for Active Learning Strategies for Weakly-supervised Object Detection

Figure 4 for Active Learning Strategies for Weakly-supervised Object Detection

Abstract:Object detectors trained with weak annotations are affordable alternatives to fully-supervised counterparts. However, there is still a significant performance gap between them. We propose to narrow this gap by fine-tuning a base pre-trained weakly-supervised detector with a few fully-annotated samples automatically selected from the training set using ``box-in-box'' (BiB), a novel active learning strategy designed specifically to address the well-documented failure modes of weakly-supervised detectors. Experiments on the VOC07 and COCO benchmarks show that BiB outperforms other active learning techniques and significantly improves the base weakly-supervised detector's performance with only a few fully-annotated images per class. BiB reaches 97% of the performance of fully-supervised Fast RCNN with only 10% of fully-annotated images on VOC07. On COCO, using on average 10 fully-annotated images per class, or equivalently 1% of the training set, BiB also reduces the performance gap (in AP) between the weakly-supervised detector and the fully-supervised Fast RCNN by over 70%, showing a good trade-off between performance and data efficiency. Our code is publicly available at https://github.com/huyvvo/BiB.

* Accepted to European Conference on Computer Vision (ECCV) 2022. Contains 27 pages, 9 tables and 6 figures

Via

Access Paper or Ask Questions

Assembly Planning from Observations under Physical Constraints

Apr 20, 2022

Thomas Chabal, Robin Strudel, Etienne Arlaud, Jean Ponce, Cordelia Schmid

Figure 1 for Assembly Planning from Observations under Physical Constraints

Figure 2 for Assembly Planning from Observations under Physical Constraints

Figure 3 for Assembly Planning from Observations under Physical Constraints

Figure 4 for Assembly Planning from Observations under Physical Constraints

Abstract:This paper addresses the problem of copying an unknown assembly of primitives with known shape and appearance using information extracted from a single photograph by an off-the-shelf procedure for object detection and pose estimation. The proposed algorithm uses a simple combination of physical stability constraints, convex optimization and Monte Carlo tree search to plan assemblies as sequences of pick-and-place operations represented by STRIPS operators. It is efficient and, most importantly, robust to the errors in object detection and pose estimation unavoidable in any real robotic system. The proposed approach is demonstrated with thorough experiments on a UR5 manipulator.

* See the project webpage at https://www.di.ens.fr/willow/research/assembly-planning/

Via

Access Paper or Ask Questions

Localizing Objects with Self-Supervised Transformers and no Labels

Sep 29, 2021

Oriane Siméoni, Gilles Puy, Huy V. Vo, Simon Roburin, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Renaud Marlet, Jean Ponce

Figure 1 for Localizing Objects with Self-Supervised Transformers and no Labels

Figure 2 for Localizing Objects with Self-Supervised Transformers and no Labels

Figure 3 for Localizing Objects with Self-Supervised Transformers and no Labels

Figure 4 for Localizing Objects with Self-Supervised Transformers and no Labels

Abstract:Localizing objects in image collections without supervision can help to avoid expensive annotation campaigns. We propose a simple approach to this problem, that leverages the activation features of a vision transformer pre-trained in a self-supervised manner. Our method, LOST, does not require any external object proposal nor any exploration of the image collection; it operates on a single image. Yet, we outperform state-of-the-art object discovery methods by up to 8 CorLoc points on PASCAL VOC 2012. We also show that training a class-agnostic detector on the discovered objects boosts results by another 7 points. Moreover, we show promising results on the unsupervised object discovery task. The code to reproduce our results can be found at https://github.com/valeoai/LOST.

Via

Access Paper or Ask Questions

CCVS: Context-aware Controllable Video Synthesis

Jul 16, 2021

Guillaume Le Moing, Jean Ponce, Cordelia Schmid

Figure 1 for CCVS: Context-aware Controllable Video Synthesis

Figure 2 for CCVS: Context-aware Controllable Video Synthesis

Figure 3 for CCVS: Context-aware Controllable Video Synthesis

Figure 4 for CCVS: Context-aware Controllable Video Synthesis

Abstract:This presentation introduces a self-supervised learning approach to the synthesis of new video clips from old ones, with several new key elements for improved spatial resolution and realism: It conditions the synthesis process on contextual information for temporal continuity and ancillary information for fine control. The prediction model is doubly autoregressive, in the latent space of an autoencoder for forecasting, and in image space for updating contextual information, which is also used to enforce spatio-temporal consistency through a learnable optical flow module. Adversarial training of the autoencoder in the appearance and temporal domains is used to further improve the realism of its output. A quantizer inserted between the encoder and the transformer in charge of forecasting future frames in latent space (and its inverse inserted between the transformer and the decoder) adds even more flexibility by affording simple mechanisms for handling multimodal ancillary information for controlling the synthesis process (eg, a few sample frames, an audio track, a trajectory in image space) and taking into account the intrinsically uncertain nature of the future by allowing multiple predictions. Experiments with an implementation of the proposed approach give very good qualitative and quantitative results on multiple tasks and standard benchmarks.

Via

Access Paper or Ask Questions

Residual Reinforcement Learning from Demonstrations

Jun 15, 2021

Minttu Alakuijala, Gabriel Dulac-Arnold, Julien Mairal, Jean Ponce, Cordelia Schmid

Figure 1 for Residual Reinforcement Learning from Demonstrations

Figure 2 for Residual Reinforcement Learning from Demonstrations

Figure 3 for Residual Reinforcement Learning from Demonstrations

Figure 4 for Residual Reinforcement Learning from Demonstrations

Abstract:Residual reinforcement learning (RL) has been proposed as a way to solve challenging robotic tasks by adapting control actions from a conventional feedback controller to maximize a reward signal. We extend the residual formulation to learn from visual inputs and sparse rewards using demonstrations. Learning from images, proprioceptive inputs and a sparse task-completion reward relaxes the requirement of accessing full state features, such as object and target positions. In addition, replacing the base controller with a policy learned from demonstrations removes the dependency on a hand-engineered controller in favour of a dataset of demonstrations, which can be provided by non-experts. Our experimental evaluation on simulated manipulation tasks on a 6-DoF UR5 arm and a 28-DoF dexterous hand demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning, and is capable of solving high-dimensional, sparse-reward tasks out of reach for RL from scratch.

Via

Access Paper or Ask Questions

Large-Scale Unsupervised Object Discovery

Jun 12, 2021

Huy V. Vo, Elena Sizikova, Cordelia Schmid, Patrick Pérez, Jean Ponce

Figure 1 for Large-Scale Unsupervised Object Discovery

Figure 2 for Large-Scale Unsupervised Object Discovery

Figure 3 for Large-Scale Unsupervised Object Discovery

Figure 4 for Large-Scale Unsupervised Object Discovery

Abstract:Existing approaches to unsupervised object discovery (UOD) do not scale up to large datasets without approximations which compromise their performance. We propose a novel formulation of UOD as a ranking problem, amenable to the arsenal of distributed methods available for eigenvalue problems and link analysis. Extensive experiments with COCO and OpenImages demonstrate that, in the single-object discovery setting where a single prominent object is sought in each image, the proposed LOD (Large-scale Object Discovery) approach is on par with, or better than the state of the art for medium-scale datasets (up to 120K images), and over 37% better than the only other algorithms capable of scaling up to 1.7M images. In the multi-object discovery setting where multiple objects are sought in each image, the proposed LOD is over 14% better in average precision (AP) than all other methods for datasets ranging from 20K to 1.7M images.

* 16 pages

Via

Access Paper or Ask Questions

NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results

Jun 07, 2021

Goutam Bhat, Martin Danelljan, Radu Timofte, Kazutoshi Akita, Wooyeong Cho, Haoqiang Fan, Lanpeng Jia, Daeshik Kim, Bruno Lecouat, Youwei Li(+18 more)

Figure 1 for NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results

Figure 2 for NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results

Figure 3 for NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results

Figure 4 for NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results

Abstract:This paper reviews the NTIRE2021 challenge on burst super-resolution. Given a RAW noisy burst as input, the task in the challenge was to generate a clean RGB image with 4 times higher resolution. The challenge contained two tracks; Track 1 evaluating on synthetically generated data, and Track 2 using real-world bursts from mobile camera. In the final testing phase, 6 teams submitted results using a diverse set of solutions. The top-performing methods set a new state-of-the-art for the burst super-resolution task.

* NTIRE 2021 Burst Super-Resolution challenge report

Via

Access Paper or Ask Questions

VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

May 11, 2021

Adrien Bardes, Jean Ponce, Yann LeCun

Figure 1 for VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

Figure 2 for VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

Figure 3 for VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

Figure 4 for VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

Abstract:Recent self-supervised methods for image representation learning are based on maximizing the agreement between embedding vectors from different views of the same image. A trivial solution is obtained when the encoder outputs constant vectors. This collapse problem is often avoided through implicit biases in the learning architecture, that often lack a clear justification or interpretation. In this paper, we introduce VICReg (Variance-Invariance-Covariance Regularization), a method that explicitly avoids the collapse problem with a simple regularization term on the variance of the embeddings along each dimension individually. VICReg combines the variance term with a decorrelation mechanism based on redundancy reduction and covariance regularization, and achieves results on par with the state of the art on several downstream tasks. In addition, we show that incorporating our new variance term into other methods helps stabilize the training and leads to performance improvements.

Via

Access Paper or Ask Questions