Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jean Ponce

DI-ENS, CDS

Dense Optical Tracking: Connecting the Dots

Dec 07, 2023

Guillaume Le Moing, Jean Ponce, Cordelia Schmid

Figure 1 for Dense Optical Tracking: Connecting the Dots

Figure 2 for Dense Optical Tracking: Connecting the Dots

Figure 3 for Dense Optical Tracking: Connecting the Dots

Figure 4 for Dense Optical Tracking: Connecting the Dots

Abstract:Recent approaches to point tracking are able to recover the trajectory of any scene point through a large portion of a video despite the presence of occlusions. They are, however, too slow in practice to track every point observed in a single frame in a reasonable amount of time. This paper introduces DOT, a novel, simple and efficient method for solving this problem. It first extracts a small set of tracks from key regions at motion boundaries using an off-the-shelf point tracking algorithm. Given source and target frames, DOT then computes rough initial estimates of a dense flow field and visibility mask through nearest-neighbor interpolation, before refining them using a learnable optical flow estimator that explicitly handles occlusions and can be trained on synthetic data with ground-truth correspondences. We show that DOT is significantly more accurate than current optical flow techniques, outperforms sophisticated "universal" trackers like OmniMotion, and is on par with, or better than, the best point tracking algorithms like CoTracker while being at least two orders of magnitude faster. Quantitative and qualitative experiments with synthetic and real videos validate the promise of the proposed approach. Code, data, and videos showcasing the capabilities of our approach are available in the project webpage: https://16lemoing.github.io/dot .

Via

Access Paper or Ask Questions

Towards Real-World Focus Stacking with Deep Learning

Nov 29, 2023

Alexandre Araujo, Jean Ponce, Julien Mairal

Figure 1 for Towards Real-World Focus Stacking with Deep Learning

Figure 2 for Towards Real-World Focus Stacking with Deep Learning

Figure 3 for Towards Real-World Focus Stacking with Deep Learning

Figure 4 for Towards Real-World Focus Stacking with Deep Learning

Abstract:Focus stacking is widely used in micro, macro, and landscape photography to reconstruct all-in-focus images from multiple frames obtained with focus bracketing, that is, with shallow depth of field and different focus planes. Existing deep learning approaches to the underlying multi-focus image fusion problem have limited applicability to real-world imagery since they are designed for very short image sequences (two to four images), and are typically trained on small, low-resolution datasets either acquired by light-field cameras or generated synthetically. We introduce a new dataset consisting of 94 high-resolution bursts of raw images with focus bracketing, with pseudo ground truth computed from the data using state-of-the-art commercial software. This dataset is used to train the first deep learning algorithm for focus stacking capable of handling bursts of sufficient length for real-world applications. Qualitative experiments demonstrate that it is on par with existing commercial solutions in the long-burst, realistic regime while being significantly more tolerant to noise. The code and dataset are available at https://github.com/araujoalexandre/FocusStackingDataset.

Via

Access Paper or Ask Questions

Revisiting Deformable Convolution for Depth Completion

Aug 03, 2023

Xinglong Sun, Jean Ponce, Yu-Xiong Wang

Figure 1 for Revisiting Deformable Convolution for Depth Completion

Figure 2 for Revisiting Deformable Convolution for Depth Completion

Figure 3 for Revisiting Deformable Convolution for Depth Completion

Figure 4 for Revisiting Deformable Convolution for Depth Completion

Abstract:Depth completion, which aims to generate high-quality dense depth maps from sparse depth maps, has attracted increasing attention in recent years. Previous work usually employs RGB images as guidance, and introduces iterative spatial propagation to refine estimated coarse depth maps. However, most of the propagation refinement methods require several iterations and suffer from a fixed receptive field, which may contain irrelevant and useless information with very sparse input. In this paper, we address these two challenges simultaneously by revisiting the idea of deformable convolution. We propose an effective architecture that leverages deformable kernel convolution as a single-pass refinement module, and empirically demonstrate its superiority. To better understand the function of deformable convolution and exploit it for depth completion, we further systematically investigate a variety of representative strategies. Our study reveals that, different from prior work, deformable convolution needs to be applied on an estimated depth map with a relatively high density for better performance. We evaluate our model on the large-scale KITTI dataset and achieve state-of-the-art level performance in both accuracy and inference speed. Our code is available at https://github.com/AlexSunNik/ReDC.

* Accepted and going to appear at IROS2023

Via

Access Paper or Ask Questions

MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features

Jul 24, 2023

Adrien Bardes, Jean Ponce, Yann LeCun

Abstract:Self-supervised learning of visual representations has been focusing on learning content features, which do not capture object motion or location, and focus on identifying and differentiating objects in images and videos. On the other hand, optical flow estimation is a task that does not involve understanding the content of the images on which it is estimated. We unify the two approaches and introduce MC-JEPA, a joint-embedding predictive architecture and self-supervised learning approach to jointly learn optical flow and content features within a shared encoder, demonstrating that the two associated objectives; the optical flow estimation objective and the self-supervised learning objective; benefit from each other and thus learn content features that incorporate motion information. The proposed approach achieves performance on-par with existing unsupervised optical flow benchmarks, as well as with common self-supervised learning approaches on downstream tasks such as semantic segmentation of images and videos.

Via

Access Paper or Ask Questions

Combining multi-spectral data with statistical and deep-learning models for improved exoplanet detection in direct imaging at high contrast

Jun 21, 2023

Olivier Flasseur, Théo Bodrito, Julien Mairal, Jean Ponce, Maud Langlois, Anne-Marie Lagrange

Abstract:Exoplanet detection by direct imaging is a difficult task: the faint signals from the objects of interest are buried under a spatially structured nuisance component induced by the host star. The exoplanet signals can only be identified when combining several observations with dedicated detection algorithms. In contrast to most of existing methods, we propose to learn a model of the spatial, temporal and spectral characteristics of the nuisance, directly from the observations. In a pre-processing step, a statistical model of their correlations is built locally, and the data are centered and whitened to improve both their stationarity and signal-to-noise ratio (SNR). A convolutional neural network (CNN) is then trained in a supervised fashion to detect the residual signature of synthetic sources in the pre-processed images. Our method leads to a better trade-off between precision and recall than standard approaches in the field. It also outperforms a state-of-the-art algorithm based solely on a statistical framework. Besides, the exploitation of the spectral diversity improves the performance compared to a similar model built solely from spatio-temporal data.

* accepted to EUSIPCO 2023

Via

Access Paper or Ask Questions

An Image Quality Assessment Dataset for Portraits

Apr 12, 2023

Nicolas Chahine, Ana-Stefania Calarasanu, Davide Garcia-Civiero, Theo Cayla, Sira Ferradans, Jean Ponce

Figure 1 for An Image Quality Assessment Dataset for Portraits

Figure 2 for An Image Quality Assessment Dataset for Portraits

Figure 3 for An Image Quality Assessment Dataset for Portraits

Figure 4 for An Image Quality Assessment Dataset for Portraits

Abstract:Year after year, the demand for ever-better smartphone photos continues to grow, in particular in the domain of portrait photography. Manufacturers thus use perceptual quality criteria throughout the development of smartphone cameras. This costly procedure can be partially replaced by automated learning-based methods for image quality assessment (IQA). Due to its subjective nature, it is necessary to estimate and guarantee the consistency of the IQA process, a characteristic lacking in the mean opinion scores (MOS) widely used for crowdsourcing IQA. In addition, existing blind IQA (BIQA) datasets pay little attention to the difficulty of cross-content assessment, which may degrade the quality of annotations. This paper introduces PIQ23, a portrait-specific IQA dataset of 5116 images of 50 predefined scenarios acquired by 100 smartphones, covering a high variety of brands, models, and use cases. The dataset includes individuals of various genders and ethnicities who have given explicit and informed consent for their photographs to be used in public research. It is annotated by pairwise comparisons (PWC) collected from over 30 image quality experts for three image attributes: face detail preservation, face target exposure, and overall image quality. An in-depth statistical analysis of these annotations allows us to evaluate their consistency over PIQ23. Finally, we show through an extensive comparison with existing baselines that semantic information (image context) can be used to improve IQA predictions. The dataset along with the proposed statistical analysis and BIQA algorithms are available: https://github.com/DXOMARK-Research/PIQ2023

* Conference on Computer Vision and Pattern Recognition 2023, IEEE/CVF, Jun 2023, Vancouver, Canada

Via

Access Paper or Ask Questions

Pixel-wise Agricultural Image Time Series Classification: Comparisons and a Deformable Prototype-based Approach

Mar 22, 2023

Elliot Vincent, Jean Ponce, Mathieu Aubry

Figure 1 for Pixel-wise Agricultural Image Time Series Classification: Comparisons and a Deformable Prototype-based Approach

Figure 2 for Pixel-wise Agricultural Image Time Series Classification: Comparisons and a Deformable Prototype-based Approach

Figure 3 for Pixel-wise Agricultural Image Time Series Classification: Comparisons and a Deformable Prototype-based Approach

Figure 4 for Pixel-wise Agricultural Image Time Series Classification: Comparisons and a Deformable Prototype-based Approach

Abstract:Improvements in Earth observation by satellites allow for imagery of ever higher temporal and spatial resolution. Leveraging this data for agricultural monitoring is key for addressing environmental and economic challenges. Current methods for crop segmentation using temporal data either rely on annotated data or are heavily engineered to compensate the lack of supervision. In this paper, we present and compare datasets and methods for both supervised and unsupervised pixel-wise segmentation of satellite image time series (SITS). We also introduce an approach to add invariance to spectral deformations and temporal shifts to classical prototype-based methods such as K-means and Nearest Centroid Classifier (NCC). We show this simple and highly interpretable method leads to meaningful results in both the supervised and unsupervised settings and significantly improves the state of the art for unsupervised classification of agricultural time series on four recent SITS datasets.

Via

Access Paper or Ask Questions

WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction

Nov 25, 2022

Guillaume Le Moing, Jean Ponce, Cordelia Schmid

Figure 1 for WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction

Figure 2 for WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction

Figure 3 for WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction

Figure 4 for WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction

Abstract:This paper presents WALDO (WArping Layer-Decomposed Objects), a novel approach to the prediction of future video frames from past ones. Individual images are decomposed into multiple layers combining object masks and a small set of control points. The layer structure is shared across all frames in each video to build dense inter-frame connections. Complex scene motions are modeled by combining parametric geometric transformations associated with individual layers, and video synthesis is broken down into discovering the layers associated with past frames, predicting the corresponding transformations for upcoming ones and warping the associated object regions accordingly, and filling in the remaining image parts. Extensive experiments on the Cityscapes (resp. KITTI) dataset show that WALDO significantly outperforms prior works with, e.g., 3, 27, and 51% (resp. 5, 20 and 11%) relative improvement in SSIM, LPIPS and FVD metrics. Code, pretrained models, and video samples synthesized by our approach can be found in the project webpage https://16lemoing.github.io/waldo.

Via

Access Paper or Ask Questions

A minimum swept-volume metric structure for configuration space

Nov 21, 2022

Yann de Mont-Marin, Jean Ponce, Jean-Paul Laumond

Figure 1 for A minimum swept-volume metric structure for configuration space

Figure 2 for A minimum swept-volume metric structure for configuration space

Figure 3 for A minimum swept-volume metric structure for configuration space

Figure 4 for A minimum swept-volume metric structure for configuration space

Abstract:Borrowing elementary ideas from solid mechanics and differential geometry, this presentation shows that the volume swept by a regular solid undergoing a wide class of volume-preserving deformations induces a rather natural metric structure with well-defined and computable geodesics on its configuration space. This general result applies to concrete classes of articulated objects such as robot manipulators, and we demonstrate as a proof of concept the computation of geodesic paths for a free flying rod and planar robotic arms as well as their use in path planning with many obstacles.

Via

Access Paper or Ask Questions

Learning Reward Functions for Robotic Manipulation by Observing Humans

Nov 16, 2022

Minttu Alakuijala, Gabriel Dulac-Arnold, Julien Mairal, Jean Ponce, Cordelia Schmid

Abstract:Observing a human demonstrator manipulate objects provides a rich, scalable and inexpensive source of data for learning robotic policies. However, transferring skills from human videos to a robotic manipulator poses several challenges, not least a difference in action and observation spaces. In this work, we use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies. Thanks to the diversity of this training data, the learned reward function sufficiently generalizes to image observations from a previously unseen robot embodiment and environment to provide a meaningful prior for directed exploration in reinforcement learning. The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective. By conditioning the function on a goal image, we are able to reuse one model across a variety of tasks. Unlike prior work on leveraging human videos to teach robots, our method, Human Offline Learned Distances (HOLD) requires neither a priori data from the robot environment, nor a set of task-specific human demonstrations, nor a predefined notion of correspondence across morphologies, yet it is able to accelerate training of several manipulation tasks on a simulated robot arm compared to using only a sparse reward obtained from task completion.

Via

Access Paper or Ask Questions