Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Francesco Sambo

FLaRA: Predicting Future Latent Representations for Accident Anticipation

Jun 12, 2026

Lorenzo Caselli, Tomaso Trinci, Tommaso Bianconcini, Simone Magistri, Leonardo Taccari, Francesco Sambo, Andrew D. Bagdanov

Abstract:Anticipating traffic accidents from dashcam videos is a critical challenge in intelligent transportation systems. Existing methods typically map visual context directly to a collision probability without explicitly modeling the future evolution of the driving scene. In this paper we propose FLaRA (Predicting Future Latent Representations for Accident Anticipation), a novel predictive architecture that shifts this paradigm by forecasting future latent representations for accident anticipation. Building upon the Video Joint-Embedding Predictive Architecture (V-JEPA2), our model conditions a predictor network on observed context frames to predict the forthcoming latent features of the scene. A classifier then operates on these predicted future representations rather than only on past observations. To ensure these forecasts remain grounded in realistic future dynamics, we introduce a joint training objective that simultaneously optimizes an auxiliary feature-level reconstruction loss and a cross-entropy classification loss. Extensive evaluations on the Nexar dataset, alongside cross-domain validations on the DAD, DADA-2000, and DoTA benchmarks, demonstrate that our approach achieves state-of-the-art performance while maintaining realistic early warning capabilities.

* Accepted at the 2026 IEEE International Conference on Intelligent Transportation Systems (ITSC 2026)

Via

Access Paper or Ask Questions

RendBEV: Semantic Novel View Synthesis for Self-Supervised Bird's Eye View Segmentation

Feb 20, 2025

Henrique Piñeiro Monteagudo, Leonardo Taccari, Aurel Pjetri, Francesco Sambo, Samuele Salti

Figure 1 for RendBEV: Semantic Novel View Synthesis for Self-Supervised Bird's Eye View Segmentation

Figure 2 for RendBEV: Semantic Novel View Synthesis for Self-Supervised Bird's Eye View Segmentation

Figure 3 for RendBEV: Semantic Novel View Synthesis for Self-Supervised Bird's Eye View Segmentation

Figure 4 for RendBEV: Semantic Novel View Synthesis for Self-Supervised Bird's Eye View Segmentation

Abstract:Bird's Eye View (BEV) semantic maps have recently garnered a lot of attention as a useful representation of the environment to tackle assisted and autonomous driving tasks. However, most of the existing work focuses on the fully supervised setting, training networks on large annotated datasets. In this work, we present RendBEV, a new method for the self-supervised training of BEV semantic segmentation networks, leveraging differentiable volumetric rendering to receive supervision from semantic perspective views computed by a 2D semantic segmentation model. Our method enables zero-shot BEV semantic segmentation, and already delivers competitive results in this challenging setting. When used as pretraining to then fine-tune on labeled BEV ground-truth, our method significantly boosts performance in low-annotation regimes, and sets a new state of the art when fine-tuning on all available labels.

* Accepted at WACV 2025

Via

Access Paper or Ask Questions

An object detection approach for lane change and overtake detection from motion profiles

Feb 06, 2025

Andrea Benericetti, Niccolò Bellaccini, Henrique Piñeiro Monteagudo, Matteo Simoncini, Francesco Sambo

Figure 1 for An object detection approach for lane change and overtake detection from motion profiles

Figure 2 for An object detection approach for lane change and overtake detection from motion profiles

Figure 3 for An object detection approach for lane change and overtake detection from motion profiles

Figure 4 for An object detection approach for lane change and overtake detection from motion profiles

Abstract:In the application domain of fleet management and driver monitoring, it is very challenging to obtain relevant driving events and activities from dashcam footage while minimizing the amount of information stored and analyzed. In this paper, we address the identification of overtake and lane change maneuvers with a novel object detection approach applied to motion profiles, a compact representation of driving video footage into a single image. To train and test our model we created an internal dataset of motion profile images obtained from a heterogeneous set of dashcam videos, manually labeled with overtake and lane change maneuvers by the ego-vehicle. In addition to a standard object-detection approach, we show how the inclusion of CoordConvolution layers further improves the model performance, in terms of mAP and F1 score, yielding state-of-the art performance when compared to other baselines from the literature. The extremely low computational requirements of the proposed solution make it especially suitable to run in device.

* 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 2023, pp. 1389-1394
* 6 pages, 3 figures

Via

Access Paper or Ask Questions

A New Dataset for Monocular Depth Estimation Under Viewpoint Shifts

Sep 27, 2024

Aurel Pjetri, Stefano Caprasecca, Leonardo Taccari, Matteo Simoncini, Henrique Piñeiro Monteagudo, Walter Wallace, Douglas Coimbra de Andrade, Francesco Sambo, Andrew David Bagdanov

Figure 1 for A New Dataset for Monocular Depth Estimation Under Viewpoint Shifts

Figure 2 for A New Dataset for Monocular Depth Estimation Under Viewpoint Shifts

Figure 3 for A New Dataset for Monocular Depth Estimation Under Viewpoint Shifts

Figure 4 for A New Dataset for Monocular Depth Estimation Under Viewpoint Shifts

Abstract:Monocular depth estimation is a critical task for autonomous driving and many other computer vision applications. While significant progress has been made in this field, the effects of viewpoint shifts on depth estimation models remain largely underexplored. This paper introduces a novel dataset and evaluation methodology to quantify the impact of different camera positions and orientations on monocular depth estimation performance. We propose a ground truth strategy based on homography estimation and object detection, eliminating the need for expensive lidar sensors. We collect a diverse dataset of road scenes from multiple viewpoints and use it to assess the robustness of a modern depth estimation model to geometric shifts. After assessing the validity of our strategy on a public dataset, we provide valuable insights into the limitations of current models and highlight the importance of considering viewpoint variations in real-world applications.

* 17 pages, 5 figures. Accepted at ECCV 2024 2nd Workshop on Vision-Centric Autonomous Driving (VCAD)

Via

Access Paper or Ask Questions