Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Simone Calderara

Future Urban Scenes Generation Through Vehicles Synthesis

Jul 01, 2020

Alessandro Simoni, Luca Bergamini, Andrea Palazzi, Simone Calderara, Rita Cucchiara

Figure 1 for Future Urban Scenes Generation Through Vehicles Synthesis

Figure 2 for Future Urban Scenes Generation Through Vehicles Synthesis

Figure 3 for Future Urban Scenes Generation Through Vehicles Synthesis

Figure 4 for Future Urban Scenes Generation Through Vehicles Synthesis

Abstract:In this work we propose a deep learning pipeline to predict the visual future appearance of an urban scene. Despite recent advances, generating the entire scene in an end-to-end fashion is still far from being achieved. Instead, here we follow a two stages approach, where interpretable information is included in the loop and each actor is modelled independently. We leverage a per-object novel view synthesis paradigm; i.e. generating a synthetic representation of an object undergoing a geometrical roto-translation in the 3D space. Our model can be easily conditioned with constraints (e.g. input trajectories) provided by state-of-the-art tracking methods or by the user itself. This allows us to generate a set of diverse realistic futures starting from the same input in a multi-modal fashion. We visually and quantitatively show the superiority of this approach over traditional end-to-end scene-generation methods on CityFlow, a challenging real world dataset.

Via

Access Paper or Ask Questions

The color out of space: learning self-supervised representations for Earth Observation imagery

Jun 22, 2020

Stefano Vincenzi, Angelo Porrello, Pietro Buzzega, Marco Cipriano, Pietro Fronte, Roberto Cuccu, Carla Ippoliti, Annamaria Conte, Simone Calderara

Figure 1 for The color out of space: learning self-supervised representations for Earth Observation imagery

Figure 2 for The color out of space: learning self-supervised representations for Earth Observation imagery

Figure 3 for The color out of space: learning self-supervised representations for Earth Observation imagery

Figure 4 for The color out of space: learning self-supervised representations for Earth Observation imagery

Abstract:The recent growth in the number of satellite images fosters the development of effective deep-learning techniques for Remote Sensing (RS). However, their full potential is untapped due to the lack of large annotated datasets. Such a problem is usually countered by fine-tuning a feature extractor that is previously trained on the ImageNet dataset. Unfortunately, the domain of natural images differs from the RS one, which hinders the final performance. In this work, we propose to learn meaningful representations from satellite imagery, leveraging its high-dimensionality spectral bands to reconstruct the visible colors. We conduct experiments on land cover classification (BigEarthNet) and West Nile Virus detection, showing that colorization is a solid pretext task for training a feature extractor. Furthermore, we qualitatively observe that guesses based on natural images and colorization rely on different parts of the input. This paves the way to an ensemble model that eventually outperforms both the above-mentioned techniques.

* 8 pages, 2 figures. Accepted in the 25th International Conference on PATTERN RECOGNITION (ICPR 2020), Milan, Italy

Via

Access Paper or Ask Questions

DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting

May 26, 2020

Alessio Monti, Alessia Bertugli, Simone Calderara, Rita Cucchiara

Figure 1 for DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting

Figure 2 for DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting

Figure 3 for DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting

Figure 4 for DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting

Abstract:Understanding human motion behaviour is a critical task for several possible applications like self-driving cars or social robots, and in general for all those settings where an autonomous agent has to navigate inside a human-centric environment. This is non-trivial because human motion is inherently multi-modal: given a history of human motion paths, there are many plausible ways by which people could move in the future. Additionally, people activities are often driven by goals, e.g. reaching particular locations or interacting with the environment. We address both the aforementioned aspects by proposing a new recurrent generative model that considers both single agents' future goals and interactions between different agents. The model exploits a double attention-based graph neural network to collect information about the mutual influences among different agents and integrates it with data about agents' possible future objectives. Our proposal is general enough to be applied in different scenarios: the model achieves state-of-the-art results in both urban environments and also in sports applications.

Via

Access Paper or Ask Questions

AC-VRNN: Attentive Conditional-VRNN for Multi-Future Trajectory Prediction

May 17, 2020

Alessia Bertugli, Simone Calderara, Pasquale Coscia, Lamberto Ballan, Rita Cucchiara

Figure 1 for AC-VRNN: Attentive Conditional-VRNN for Multi-Future Trajectory Prediction

Figure 2 for AC-VRNN: Attentive Conditional-VRNN for Multi-Future Trajectory Prediction

Figure 3 for AC-VRNN: Attentive Conditional-VRNN for Multi-Future Trajectory Prediction

Figure 4 for AC-VRNN: Attentive Conditional-VRNN for Multi-Future Trajectory Prediction

Abstract:Anticipating human motion in crowded scenarios is essential for developing intelligent transportation systems, social-aware robots and advanced video-surveillance applications. An important aspect of such task is represented by the inherently multi-modal nature of human paths which makes socially-acceptable multiple futures when human interactions are involved. To this end, we propose a new generative model for multi-future trajectory prediction based on Conditional Variational Recurrent Neural Networks (C-VRNNs). Conditioning relies on prior belief maps, representing most likely moving directions and forcing the model to consider the collective agents' motion. Human interactions are modeled in a structured way with a graph attention mechanism, providing an online attentive hidden state refinement of the recurrent estimation. Compared to sequence-to-sequence methods, our model operates step-by-step, generating more refined and accurate predictions. To corroborate our model, we perform extensive experiments on publicly-available datasets (ETH, UCY and Stanford Drone Dataset) and demonstrate its effectiveness compared to state-of-the-art methods.

Via

Access Paper or Ask Questions

Dark Experience for General Continual Learning: a Strong, Simple Baseline

Apr 15, 2020

Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, Simone Calderara

Figure 1 for Dark Experience for General Continual Learning: a Strong, Simple Baseline

Figure 2 for Dark Experience for General Continual Learning: a Strong, Simple Baseline

Figure 3 for Dark Experience for General Continual Learning: a Strong, Simple Baseline

Figure 4 for Dark Experience for General Continual Learning: a Strong, Simple Baseline

Abstract:Neural networks struggle to learn continuously, as they forget the old knowledge catastrophically whenever the data distribution changes over time. Recently, Continual Learning has inspired a plethora of approaches and evaluation settings; however, the majority of them overlooks the properties of a practical scenario, where the data stream cannot be shaped as a sequence of tasks and offline training is not viable. We work towards General Continual Learning (GCL), where task boundaries blur and the domain and class distributions shift either gradually or suddenly. We address it through Dark Experience Replay, namely matching the network's logits sampled throughout the optimization trajectory, thus promoting consistency with its past. By conducting an extensive analysis on top of standard benchmarks, we show that such a seemingly simple baseline outperforms consolidated approaches and leverages limited resources. To provide a better understanding, we further introduce MNIST-360, a novel GCL evaluation setting.

* 18 pages, 6 figures

Via

Access Paper or Ask Questions

Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation

Apr 01, 2020

Matteo Fabbri, Fabio Lanzi, Simone Calderara, Stefano Alletto, Rita Cucchiara

Figure 1 for Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation

Figure 2 for Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation

Figure 3 for Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation

Figure 4 for Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation

Abstract:In this paper we present a novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images. We propose to use high resolution volumetric heatmaps to model joint locations, devising a simple and effective compression method to drastically reduce the size of this representation. At the core of the proposed method lies our Volumetric Heatmap Autoencoder, a fully-convolutional network tasked with the compression of ground-truth heatmaps into a dense intermediate representation. A second model, the Code Predictor, is then trained to predict these codes, which can be decompressed at test time to re-obtain the original representation. Our experimental evaluation shows that our method performs favorably when compared to state of the art on both multi-person and single-person 3D human pose estimation datasets and, thanks to our novel compression strategy, can process full-HD images at the constant runtime of 8 fps regardless of the number of subjects in the scene. Code and models available at https://github.com/fabbrimatteo/LoCO .

* CVPR 2020

Via

Access Paper or Ask Questions

Conditional Channel Gated Networks for Task-Aware Continual Learning

Mar 31, 2020

Davide Abati, Jakub Tomczak, Tijmen Blankevoort, Simone Calderara, Rita Cucchiara, Babak Ehteshami Bejnordi

Figure 1 for Conditional Channel Gated Networks for Task-Aware Continual Learning

Figure 2 for Conditional Channel Gated Networks for Task-Aware Continual Learning

Figure 3 for Conditional Channel Gated Networks for Task-Aware Continual Learning

Figure 4 for Conditional Channel Gated Networks for Task-Aware Continual Learning

Abstract:Convolutional Neural Networks experience catastrophic forgetting when optimized on a sequence of learning problems: as they meet the objective of the current training examples, their performance on previous tasks drops drastically. In this work, we introduce a novel framework to tackle this problem with conditional computation. We equip each convolutional layer with task-specific gating modules, selecting which filters to apply on the given input. This way, we achieve two appealing properties. Firstly, the execution patterns of the gates allow to identify and protect important filters, ensuring no loss in the performance of the model for previously learned tasks. Secondly, by using a sparsity objective, we can promote the selection of a limited set of kernels, allowing to retain sufficient model capacity to digest new tasks.Existing solutions require, at test time, awareness of the task to which each example belongs to. This knowledge, however, may not be available in many practical scenarios. Therefore, we additionally introduce a task classifier that predicts the task label of each example, to deal with settings in which a task oracle is not available. We validate our proposal on four continual learning datasets. Results show that our model consistently outperforms existing methods both in the presence and the absence of a task oracle. Notably, on Split SVHN and Imagenet-50 datasets, our model yields up to 23.98% and 17.42% improvement in accuracy w.r.t. competing methods.

* CVPR 2020 (oral)

Via

Access Paper or Ask Questions

STAGE: Spatio-Temporal Attention on Graph Entities for Video Action Detection

Dec 09, 2019

Matteo Tomei, Lorenzo Baraldi, Simone Calderara, Simone Bronzin, Rita Cucchiara

Figure 1 for STAGE: Spatio-Temporal Attention on Graph Entities for Video Action Detection

Figure 2 for STAGE: Spatio-Temporal Attention on Graph Entities for Video Action Detection

Figure 3 for STAGE: Spatio-Temporal Attention on Graph Entities for Video Action Detection

Figure 4 for STAGE: Spatio-Temporal Attention on Graph Entities for Video Action Detection

Abstract:Spatio-temporal action localization is a challenging yet fascinating task that aims to detect and classify human actions in video clips. In this paper, we develop a high-level video understanding module which can encode interactions between actors and objects both in space and time. In our formulation, spatio-temporal relationships are learned by performing self-attention operations on a graph structure connecting entities from consecutive clips. Noticeably, the use of graph learning is unprecedented for this task. From a computational point of view, the proposed module is backbone independent by design and does not need end-to-end training. When tested on the AVA dataset, it demonstrates a 10-16% relative mAP improvement over the baseline. Further, it can outperform or bring performances comparable to state-of-the-art models which require heavy end-to-end and synchronized training on multiple GPUs. Code is publicly available at: https://github.com/aimagelab/STAGE_action_detection.

Via

Access Paper or Ask Questions

Spotting insects from satellites: modeling the presence of Culicoides imicola through Deep CNNs

Nov 22, 2019

Stefano Vincenzi, Angelo Porrello, Pietro Buzzega, Annamaria Conte, Carla Ippoliti, Luca Candeloro, Alessio Di Lorenzo, Andrea Capobianco Dondona, Simone Calderara

Figure 1 for Spotting insects from satellites: modeling the presence of Culicoides imicola through Deep CNNs

Figure 2 for Spotting insects from satellites: modeling the presence of Culicoides imicola through Deep CNNs

Figure 3 for Spotting insects from satellites: modeling the presence of Culicoides imicola through Deep CNNs

Figure 4 for Spotting insects from satellites: modeling the presence of Culicoides imicola through Deep CNNs

Abstract:Nowadays, Vector-Borne Diseases (VBDs) raise a severe threat for public health, accounting for a considerable amount of human illnesses. Recently, several surveillance plans have been put in place for limiting the spread of such diseases, typically involving on-field measurements. Such a systematic and effective plan still misses, due to the high costs and efforts required for implementing it. Ideally, any attempt in this field should consider the triangle vectors-host-pathogen, which is strictly linked to the environmental and climatic conditions. In this paper, we exploit satellite imagery from Sentinel-2 mission, as we believe they encode the environmental factors responsible for the vector's spread. Our analysis - conducted in a data-driver fashion - couples spectral images with ground-truth information on the abundance of Culicoides imicola. In this respect, we frame our task as a binary classification problem, underpinning Convolutional Neural Networks (CNNs) as being able to learn useful representation from multi-band images. Additionally, we provide a multi-instance variant, aimed at extracting temporal patterns from a short sequence of spectral images. Experiments show promising results, providing the foundations for novel supportive tools, which could depict where surveillance and prevention measures could be prioritized.

* 8 pages, 2 figures. Accepted in the 15th International Conference on SIGNAL IMAGE TECHNOLOGY & INTERNET BASED SYSTEMS (SITIS-2019)

Via

Access Paper or Ask Questions

Semi-parametric Object Synthesis

Jul 24, 2019

Andrea Palazzi, Luca Bergamini, Simone Calderara, Rita Cucchiara

Figure 1 for Semi-parametric Object Synthesis

Figure 2 for Semi-parametric Object Synthesis

Figure 3 for Semi-parametric Object Synthesis

Figure 4 for Semi-parametric Object Synthesis

Abstract:We present a new semi-parametric approach to synthesize novel views of an object from a single monocular image. First, we exploit man-made object symmetry and piece-wise planarity to integrate rich a-priori visual information into the novel viewpoint synthesis process. An Image Completion Network (ICN) then leverages 2.5D sketches rendered from a 3D CAD as guidance to generate a realistic image. In contrast to concurrent works, we do not rely solely on synthetic data but leverage instead existing datasets for 3D object detection to operate in a real-world scenario. Differently from competitors, our semi-parametric framework allows the handling of a wide range of 3D transformations. Thorough experimental analysis against state-of-the-art baselines shows the efficacy of our method both from a quantitative and a perceptive point of view. Code and supplementary material are available at: https://github.com/ndrplz/semiparametric

Via

Access Paper or Ask Questions