Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marwane Hariat

SS3D: End2End Self-Supervised 3D from Web Videos

Apr 24, 2026

Marwane Hariat, Gianni Franchi, David Filliat, Antoine Manzanera

Abstract:We present SS3D, a web-scale SfM-based self-supervision pretraining pipeline for feed-forward 3D estimation from monocular video. Our model jointly predicts depth, ego-motion, and intrinsics in a single forward pass and is trained/evaluated as a coherent end-to-end 3D estimator. To stabilize joint learning, we use an intrinsics-first two-stage schedule and a unified single-checkpoint evaluation protocol. Scaling SfM self-supervision to unconstrained web video is challenging due to weak multi-view observability and strong corpus heterogeneity; we address these with a multi-view signal proxy (MVS) used for filtering and curriculum sampling, and with expert training distilled into a single student. Pretraining on YouTube-8M (~100M frames after filtering) yields strong cross-domain zero-shot transfer and improved fine-tuning performance over prior self-supervised baselines. We release the pretrained checkpoint and code.

Via

Access Paper or Ask Questions

InfraParis: A multi-modal and multi-task autonomous driving dataset

Sep 27, 2023

Gianni Franchi, Marwane Hariat, Xuanlong Yu, Nacim Belkhir, Antoine Manzanera, David Filliat

Figure 1 for InfraParis: A multi-modal and multi-task autonomous driving dataset

Figure 2 for InfraParis: A multi-modal and multi-task autonomous driving dataset

Figure 3 for InfraParis: A multi-modal and multi-task autonomous driving dataset

Figure 4 for InfraParis: A multi-modal and multi-task autonomous driving dataset

Abstract:Current deep neural networks (DNNs) for autonomous driving computer vision are typically trained on specific datasets that only involve a single type of data and urban scenes. Consequently, these models struggle to handle new objects, noise, nighttime conditions, and diverse scenarios, which is essential for safety-critical applications. Despite ongoing efforts to enhance the resilience of computer vision DNNs, progress has been sluggish, partly due to the absence of benchmarks featuring multiple modalities. We introduce a novel and versatile dataset named InfraParis that supports multiple tasks across three modalities: RGB, depth, and infrared. We assess various state-of-the-art baseline techniques, encompassing models for the tasks of semantic segmentation, object detection, and depth estimation.

* 15 pages, 7 figures

Via

Access Paper or Ask Questions

Learning to Generate Training Datasets for Robust Semantic Segmentation

Aug 18, 2023

Marwane Hariat, Olivier Laurent, Rémi Kazmierczak, Shihao Zhang, Andrei Bursuc, Angela Yao, Gianni Franchi

Figure 1 for Learning to Generate Training Datasets for Robust Semantic Segmentation

Figure 2 for Learning to Generate Training Datasets for Robust Semantic Segmentation

Figure 3 for Learning to Generate Training Datasets for Robust Semantic Segmentation

Figure 4 for Learning to Generate Training Datasets for Robust Semantic Segmentation

Abstract:Semantic segmentation techniques have shown significant progress in recent years, but their robustness to real-world perturbations and data samples not seen during training remains a challenge, particularly in safety-critical applications. In this paper, we propose a novel approach to improve the robustness of semantic segmentation techniques by leveraging the synergy between label-to-image generators and image-to-label segmentation models. Specifically, we design and train Robusta, a novel robust conditional generative adversarial network to generate realistic and plausible perturbed or outlier images that can be used to train reliable segmentation models. We conduct in-depth studies of the proposed generative model, assess the performance and robustness of the downstream segmentation network, and demonstrate that our approach can significantly enhance the robustness of semantic segmentation techniques in the face of real-world perturbations, distribution shifts, and out-of-distribution samples. Our results suggest that this approach could be valuable in safety-critical applications, where the reliability of semantic segmentation techniques is of utmost importance and comes with a limited computational budget in inference. We will release our code shortly.

Via

Access Paper or Ask Questions