Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Matteo Matteucci

Department of Electronics, Information and Bioengineering

Stable Diffusion Dataset Generation for Downstream Classification Tasks

May 04, 2024

Eugenio Lomurno, Matteo D'Oria, Matteo Matteucci

Figure 1 for Stable Diffusion Dataset Generation for Downstream Classification Tasks

Figure 2 for Stable Diffusion Dataset Generation for Downstream Classification Tasks

Abstract:Recent advances in generative artificial intelligence have enabled the creation of high-quality synthetic data that closely mimics real-world data. This paper explores the adaptation of the Stable Diffusion 2.0 model for generating synthetic datasets, using Transfer Learning, Fine-Tuning and generation parameter optimisation techniques to improve the utility of the dataset for downstream classification tasks. We present a class-conditional version of the model that exploits a Class-Encoder and optimisation of key generation parameters. Our methodology led to synthetic datasets that, in a third of cases, produced models that outperformed those trained on real datasets.

Via

Access Paper or Ask Questions

Latent Neural Cellular Automata for Resource-Efficient Image Restoration

Mar 22, 2024

Andrea Menta, Alberto Archetti, Matteo Matteucci

Figure 1 for Latent Neural Cellular Automata for Resource-Efficient Image Restoration

Figure 2 for Latent Neural Cellular Automata for Resource-Efficient Image Restoration

Figure 3 for Latent Neural Cellular Automata for Resource-Efficient Image Restoration

Figure 4 for Latent Neural Cellular Automata for Resource-Efficient Image Restoration

Abstract:Neural cellular automata represent an evolution of the traditional cellular automata model, enhanced by the integration of a deep learning-based transition function. This shift from a manual to a data-driven approach significantly increases the adaptability of these models, enabling their application in diverse domains, including content generation and artificial life. However, their widespread application has been hampered by significant computational requirements. In this work, we introduce the Latent Neural Cellular Automata (LNCA) model, a novel architecture designed to address the resource limitations of neural cellular automata. Our approach shifts the computation from the conventional input space to a specially designed latent space, relying on a pre-trained autoencoder. We apply our model in the context of image restoration, which aims to reconstruct high-quality images from their degraded versions. This modification not only reduces the model's resource consumption but also maintains a flexible framework suitable for various applications. Our model achieves a significant reduction in computational requirements while maintaining high reconstruction fidelity. This increase in efficiency allows for inputs up to 16 times larger than current state-of-the-art neural cellular automata models, using the same resources.

Via

Access Paper or Ask Questions

BTGenBot: Behavior Tree Generation for Robotic Tasks with Lightweight LLMs

Mar 19, 2024

Riccardo Andrea Izzo, Gianluca Bardaro, Matteo Matteucci

Figure 1 for BTGenBot: Behavior Tree Generation for Robotic Tasks with Lightweight LLMs

Figure 2 for BTGenBot: Behavior Tree Generation for Robotic Tasks with Lightweight LLMs

Figure 3 for BTGenBot: Behavior Tree Generation for Robotic Tasks with Lightweight LLMs

Figure 4 for BTGenBot: Behavior Tree Generation for Robotic Tasks with Lightweight LLMs

Abstract:This paper presents a novel approach to generating behavior trees for robots using lightweight large language models (LLMs) with a maximum of 7 billion parameters. The study demonstrates that it is possible to achieve satisfying results with compact LLMs when fine-tuned on a specific dataset. The key contributions of this research include the creation of a fine-tuning dataset based on existing behavior trees using GPT-3.5 and a comprehensive comparison of multiple LLMs (namely llama2, llama-chat, and code-llama) across nine distinct tasks. To be thorough, we evaluated the generated behavior trees using static syntactical analysis, a validation system, a simulated environment, and a real robot. Furthermore, this work opens the possibility of deploying such solutions directly on the robot, enhancing its practical applicability. Findings from this study demonstrate the potential of LLMs with a limited number of parameters in generating effective and efficient robot behaviors.

Via

Access Paper or Ask Questions

A Deep-Learning Technique to Locate Cryptographic Operations in Side-Channel Traces

Feb 29, 2024

Giuseppe Chiari, Davide Galli, Francesco Lattari, Matteo Matteucci, Davide Zoni

Abstract:Side-channel attacks allow extracting secret information from the execution of cryptographic primitives by correlating the partially known computed data and the measured side-channel signal. However, to set up a successful side-channel attack, the attacker has to perform i) the challenging task of locating the time instant in which the target cryptographic primitive is executed inside a side-channel trace and then ii)the time-alignment of the measured data on that time instant. This paper presents a novel deep-learning technique to locate the time instant in which the target computed cryptographic operations are executed in the side-channel trace. In contrast to state-of-the-art solutions, the proposed methodology works even in the presence of trace deformations obtained through random delay insertion techniques. We validated our proposal through a successful attack against a variety of unprotected and protected cryptographic primitives that have been executed on an FPGA-implemented system-on-chip featuring a RISC-V CPU.

* Accepted for presentation by DATE'24

Via

Access Paper or Ask Questions

More than the Sum of Its Parts: Ensembling Backbone Networks for Few-Shot Segmentation

Feb 09, 2024

Nico Catalano, Alessandro Maranelli, Agnese Chiatti, Matteo Matteucci

Figure 1 for More than the Sum of Its Parts: Ensembling Backbone Networks for Few-Shot Segmentation

Figure 2 for More than the Sum of Its Parts: Ensembling Backbone Networks for Few-Shot Segmentation

Figure 3 for More than the Sum of Its Parts: Ensembling Backbone Networks for Few-Shot Segmentation

Figure 4 for More than the Sum of Its Parts: Ensembling Backbone Networks for Few-Shot Segmentation

Abstract:Semantic segmentation is a key prerequisite to robust image understanding for applications in \acrlong{ai} and Robotics. \acrlong{fss}, in particular, concerns the extension and optimization of traditional segmentation methods in challenging conditions where limited training examples are available. A predominant approach in \acrlong{fss} is to rely on a single backbone for visual feature extraction. Choosing which backbone to leverage is a deciding factor contributing to the overall performance. In this work, we interrogate on whether fusing features from different backbones can improve the ability of \acrlong{fss} models to capture richer visual features. To tackle this question, we propose and compare two ensembling techniques-Independent Voting and Feature Fusion. Among the available \acrlong{fss} methods, we implement the proposed ensembling techniques on PANet. The module dedicated to predicting segmentation masks from the backbone embeddings in PANet avoids trainable parameters, creating a controlled `in vitro' setting for isolating the impact of different ensembling strategies. Leveraging the complementary strengths of different backbones, our approach outperforms the original single-backbone PANet across standard benchmarks even in challenging one-shot learning scenarios. Specifically, it achieved a performance improvement of +7.37\% on PASCAL-5\textsuperscript{i} and of +10.68\% on COCO-20\textsuperscript{i} in the top-performing scenario where three backbones are combined. These results, together with the qualitative inspection of the predicted subject masks, suggest that relying on multiple backbones in PANet leads to a more comprehensive feature representation, thus expediting the successful application of \acrlong{fss} methods in challenging, data-scarce environments.

Via

Access Paper or Ask Questions

Can Shape-Infused Joint Embeddings Improve Image-Conditioned 3D Diffusion?

Feb 02, 2024

Cristian Sbrolli, Paolo Cudrano, Matteo Matteucci

Abstract:Recent advancements in deep generative models, particularly with the application of CLIP (Contrastive Language Image Pretraining) to Denoising Diffusion Probabilistic Models (DDPMs), have demonstrated remarkable effectiveness in text to image generation. The well structured embedding space of CLIP has also been extended to image to shape generation with DDPMs, yielding notable results. Despite these successes, some fundamental questions arise: Does CLIP ensure the best results in shape generation from images? Can we leverage conditioning to bring explicit 3D knowledge into the generative process and obtain better quality? This study introduces CISP (Contrastive Image Shape Pre training), designed to enhance 3D shape synthesis guided by 2D images. CISP aims to enrich the CLIP framework by aligning 2D images with 3D shapes in a shared embedding space, specifically capturing 3D characteristics potentially overlooked by CLIP's text image focus. Our comprehensive analysis assesses CISP's guidance performance against CLIP guided models, focusing on generation quality, diversity, and coherence of the produced shapes with the conditioning image. We find that, while matching CLIP in generation quality and diversity, CISP substantially improves coherence with input images, underscoring the value of incorporating 3D knowledge into generative models. These findings suggest a promising direction for advancing the synthesis of 3D visual content by integrating multimodal systems with 3D representations.

* 10 pages, 6 figures

Via

Access Paper or Ask Questions

Deep Learning-based Target-To-User Association in Integrated Sensing and Communication Systems

Jan 11, 2024

Lorenzo Cazzella, Marouan Mizmizi, Dario Tagliaferri, Damiano Badini, Matteo Matteucci, Umberto Spagnolini

Figure 1 for Deep Learning-based Target-To-User Association in Integrated Sensing and Communication Systems

Figure 2 for Deep Learning-based Target-To-User Association in Integrated Sensing and Communication Systems

Figure 3 for Deep Learning-based Target-To-User Association in Integrated Sensing and Communication Systems

Figure 4 for Deep Learning-based Target-To-User Association in Integrated Sensing and Communication Systems

Abstract:In Integrated Sensing and Communication (ISAC) systems, matching the radar targets with communication user equipments (UEs) is functional to several communication tasks, such as proactive handover and beam prediction. In this paper, we consider a radar-assisted communication system where a base station (BS) is equipped with a multiple-input-multiple-output (MIMO) radar that has a double aim: (i) associate vehicular radar targets to vehicular equipments (VEs) in the communication beamspace and (ii) predict the beamforming vector for each VE from radar data. The proposed target-to-user (T2U) association consists of two stages. First, vehicular radar targets are detected from range-angle images, and, for each, a beamforming vector is estimated. Then, the inferred per-target beamforming vectors are matched with the ones utilized at the BS for communication to perform target-to-user (T2U) association. Joint multi-target detection and beam inference is obtained by modifying the you only look once (YOLO) model, which is trained over simulated range-angle radar images. Simulation results over different urban vehicular mobility scenarios show that the proposed T2U method provides a probability of correct association that increases with the size of the BS antenna array, highlighting the respective increase of the separability of the VEs in the beamspace. Moreover, we show that the modified YOLO architecture can effectively perform both beam prediction and radar target detection, with similar performance in mean average precision on the latter over different antenna array sizes.

Via

Access Paper or Ask Questions

Advancements in Radar Odometry

Oct 19, 2023

Matteo Frosi, Mirko Usuelli, Matteo Matteucci

Figure 1 for Advancements in Radar Odometry

Figure 2 for Advancements in Radar Odometry

Figure 3 for Advancements in Radar Odometry

Figure 4 for Advancements in Radar Odometry

Abstract:Radar odometry estimation has emerged as a critical technique in the field of autonomous navigation, providing robust and reliable motion estimation under various environmental conditions. Despite its potential, the complex nature of radar signals and the inherent challenges associated with processing these signals have limited the widespread adoption of this technology. This paper aims to address these challenges by proposing novel improvements to an existing method for radar odometry estimation, designed to enhance accuracy and reliability in diverse scenarios. Our pipeline consists of filtering, motion compensation, oriented surface points computation, smoothing, one-to-many radar scan registration, and pose refinement. The developed method enforces local understanding of the scene, by adding additional information through smoothing techniques, and alignment of consecutive scans, as a refinement posterior to the one-to-many registration. We present an in-depth investigation of the contribution of each improvement to the localization accuracy, and we benchmark our system on the sequences of the main datasets for radar understanding, i.e., the Oxford Radar RobotCar, MulRan, and Boreas datasets. The proposed pipeline is able to achieve superior results, on all scenarios considered and under harsh environmental constraints.

Via

Access Paper or Ask Questions

Age Group Discrimination via Free Handwriting Indicators

Sep 29, 2023

Eugenio Lomurno, Simone Toffoli, Davide Di Febbo, Matteo Matteucci, Francesca Lunardini, Simona Ferrante

Figure 1 for Age Group Discrimination via Free Handwriting Indicators

Figure 2 for Age Group Discrimination via Free Handwriting Indicators

Figure 3 for Age Group Discrimination via Free Handwriting Indicators

Figure 4 for Age Group Discrimination via Free Handwriting Indicators

Abstract:The growing global elderly population is expected to increase the prevalence of frailty, posing significant challenges to healthcare systems. Frailty, a syndrome associated with ageing, is characterised by progressive health decline, increased vulnerability to stressors and increased risk of mortality. It represents a significant burden on public health and reduces the quality of life of those affected. The lack of a universally accepted method to assess frailty and a standardised definition highlights a critical research gap. Given this lack and the importance of early prevention, this study presents an innovative approach using an instrumented ink pen to ecologically assess handwriting for age group classification. Content-free handwriting data from 80 healthy participants in different age groups (20-40, 41-60, 61-70 and 70+) were analysed. Fourteen gesture- and tremor-related indicators were computed from the raw data and used in five classification tasks. These tasks included discriminating between adjacent and non-adjacent age groups using Catboost and Logistic Regression classifiers. Results indicate exceptional classifier performance, with accuracy ranging from 82.5% to 97.5%, precision from 81.8% to 100%, recall from 75% to 100% and ROC-AUC from 92.2% to 100%. Model interpretability, facilitated by SHAP analysis, revealed age-dependent sensitivity of temporal and tremor-related handwriting features. Importantly, this classification method offers potential for early detection of abnormal signs of ageing in uncontrolled settings such as remote home monitoring, thereby addressing the critical issue of frailty detection and contributing to improved care for older adults.

Via

Access Paper or Ask Questions

RadarLCD: Learnable Radar-based Loop Closure Detection Pipeline

Sep 13, 2023

Mirko Usuelli, Matteo Frosi, Paolo Cudrano, Simone Mentasti, Matteo Matteucci

Figure 1 for RadarLCD: Learnable Radar-based Loop Closure Detection Pipeline

Figure 2 for RadarLCD: Learnable Radar-based Loop Closure Detection Pipeline

Figure 3 for RadarLCD: Learnable Radar-based Loop Closure Detection Pipeline

Figure 4 for RadarLCD: Learnable Radar-based Loop Closure Detection Pipeline

Abstract:Loop Closure Detection (LCD) is an essential task in robotics and computer vision, serving as a fundamental component for various applications across diverse domains. These applications encompass object recognition, image retrieval, and video analysis. LCD consists in identifying whether a robot has returned to a previously visited location, referred to as a loop, and then estimating the related roto-translation with respect to the analyzed location. Despite the numerous advantages of radar sensors, such as their ability to operate under diverse weather conditions and provide a wider range of view compared to other commonly used sensors (e.g., cameras or LiDARs), integrating radar data remains an arduous task due to intrinsic noise and distortion. To address this challenge, this research introduces RadarLCD, a novel supervised deep learning pipeline specifically designed for Loop Closure Detection using the FMCW Radar (Frequency Modulated Continuous Wave) sensor. RadarLCD, a learning-based LCD methodology explicitly designed for radar systems, makes a significant contribution by leveraging the pre-trained HERO (Hybrid Estimation Radar Odometry) model. Being originally developed for radar odometry, HERO's features are used to select key points crucial for LCD tasks. The methodology undergoes evaluation across a variety of FMCW Radar dataset scenes, and it is compared to state-of-the-art systems such as Scan Context for Place Recognition and ICP for Loop Closure. The results demonstrate that RadarLCD surpasses the alternatives in multiple aspects of Loop Closure Detection.

* 7 pages, 2 figures

Via

Access Paper or Ask Questions