Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Matteo Matteucci

Department of Electronics, Information and Bioengineering

Neuro-Symbolic Scene Graph Conditioning for Synthetic Image Dataset Generation

Mar 21, 2025

Giacomo Savazzi, Eugenio Lomurno, Cristian Sbrolli, Agnese Chiatti, Matteo Matteucci

Figure 1 for Neuro-Symbolic Scene Graph Conditioning for Synthetic Image Dataset Generation

Figure 2 for Neuro-Symbolic Scene Graph Conditioning for Synthetic Image Dataset Generation

Figure 3 for Neuro-Symbolic Scene Graph Conditioning for Synthetic Image Dataset Generation

Figure 4 for Neuro-Symbolic Scene Graph Conditioning for Synthetic Image Dataset Generation

Abstract:As machine learning models increase in scale and complexity, obtaining sufficient training data has become a critical bottleneck due to acquisition costs, privacy constraints, and data scarcity in specialised domains. While synthetic data generation has emerged as a promising alternative, a notable performance gap remains compared to models trained on real data, particularly as task complexity grows. Concurrently, Neuro-Symbolic methods, which combine neural networks' learning strengths with symbolic reasoning's structured representations, have demonstrated significant potential across various cognitive tasks. This paper explores the utility of Neuro-Symbolic conditioning for synthetic image dataset generation, focusing specifically on improving the performance of Scene Graph Generation models. The research investigates whether structured symbolic representations in the form of scene graphs can enhance synthetic data quality through explicit encoding of relational constraints. The results demonstrate that Neuro-Symbolic conditioning yields significant improvements of up to +2.59% in standard Recall metrics and +2.83% in No Graph Constraint Recall metrics when used for dataset augmentation. These findings establish that merging Neuro-Symbolic and generative approaches produces synthetic data with complementary structural information that enhances model performance when combined with real data, providing a novel approach to overcome data scarcity limitations even for complex visual reasoning tasks.

Via

Access Paper or Ask Questions

ZO-DARTS++: An Efficient and Size-Variable Zeroth-Order Neural Architecture Search Algorithm

Mar 08, 2025

Lunchen Xie, Eugenio Lomurno, Matteo Gambella, Danilo Ardagna, Manual Roveri, Matteo Matteucci, Qingjiang Shi

Abstract:Differentiable Neural Architecture Search (NAS) provides a promising avenue for automating the complex design of deep learning (DL) models. However, current differentiable NAS methods often face constraints in efficiency, operation selection, and adaptability under varying resource limitations. We introduce ZO-DARTS++, a novel NAS method that effectively balances performance and resource constraints. By integrating a zeroth-order approximation for efficient gradient handling, employing a sparsemax function with temperature annealing for clearer and more interpretable architecture distributions, and adopting a size-variable search scheme for generating compact yet accurate architectures, ZO-DARTS++ establishes a new balance between model complexity and performance. In extensive tests on medical imaging datasets, ZO-DARTS++ improves the average accuracy by up to 1.8\% over standard DARTS-based methods and shortens search time by approximately 38.6\%. Additionally, its resource-constrained variants can reduce the number of parameters by more than 35\% while maintaining competitive accuracy levels. Thus, ZO-DARTS++ offers a versatile and efficient framework for generating high-quality, resource-aware DL models suitable for real-world medical applications.

* 14 pages, 8 figures

Via

Access Paper or Ask Questions

High-frequency near-eye ground truth for event-based eye tracking

Feb 05, 2025

Andrea Simpsi, Andrea Aspesi, Simone Mentasti, Luca Merigo, Tommaso Ongarello, Matteo Matteucci

Figure 1 for High-frequency near-eye ground truth for event-based eye tracking

Figure 2 for High-frequency near-eye ground truth for event-based eye tracking

Figure 3 for High-frequency near-eye ground truth for event-based eye tracking

Figure 4 for High-frequency near-eye ground truth for event-based eye tracking

Abstract:Event-based eye tracking is a promising solution for efficient and low-power eye tracking in smart eyewear technologies. However, the novelty of event-based sensors has resulted in a limited number of available datasets, particularly those with eye-level annotations, crucial for algorithm validation and deep-learning training. This paper addresses this gap by presenting an improved version of a popular event-based eye-tracking dataset. We introduce a semi-automatic annotation pipeline specifically designed for event-based data annotation. Additionally, we provide the scientific community with the computed annotations for pupil detection at 200Hz.

Via

Access Paper or Ask Questions

POMONAG: Pareto-Optimal Many-Objective Neural Architecture Generator

Sep 30, 2024

Eugenio Lomurno, Samuele Mariani, Matteo Monti, Matteo Matteucci

Figure 1 for POMONAG: Pareto-Optimal Many-Objective Neural Architecture Generator

Figure 2 for POMONAG: Pareto-Optimal Many-Objective Neural Architecture Generator

Figure 3 for POMONAG: Pareto-Optimal Many-Objective Neural Architecture Generator

Figure 4 for POMONAG: Pareto-Optimal Many-Objective Neural Architecture Generator

Abstract:Neural Architecture Search (NAS) automates neural network design, reducing dependence on human expertise. While NAS methods are computationally intensive and dataset-specific, auxiliary predictors reduce the models needing training, decreasing search time. This strategy is used to generate architectures satisfying multiple computational constraints. Recently, Transferable NAS has emerged, generalizing the search process from dataset-dependent to task-dependent. In this field, DiffusionNAG is a state-of-the-art method. This diffusion-based approach streamlines computation, generating architectures optimized for accuracy on unseen datasets without further adaptation. However, by focusing solely on accuracy, DiffusionNAG overlooks other crucial objectives like model complexity, computational efficiency, and inference latency -- factors essential for deploying models in resource-constrained environments. This paper introduces the Pareto-Optimal Many-Objective Neural Architecture Generator (POMONAG), extending DiffusionNAG via a many-objective diffusion process. POMONAG simultaneously considers accuracy, number of parameters, multiply-accumulate operations (MACs), and inference latency. It integrates Performance Predictor models to estimate these metrics and guide diffusion gradients. POMONAG's optimization is enhanced by expanding its training Meta-Dataset, applying Pareto Front Filtering, and refining embeddings for conditional generation. These enhancements enable POMONAG to generate Pareto-optimal architectures that outperform the previous state-of-the-art in performance and efficiency. Results were validated on two search spaces -- NASBench201 and MobileNetV3 -- and evaluated across 15 image classification datasets.

Via

Access Paper or Ask Questions

Federated Knowledge Recycling: Privacy-Preserving Synthetic Data Sharing

Jul 30, 2024

Eugenio Lomurno, Matteo Matteucci

Abstract:Federated learning has emerged as a paradigm for collaborative learning, enabling the development of robust models without the need to centralise sensitive data. However, conventional federated learning techniques have privacy and security vulnerabilities due to the exposure of models, parameters or updates, which can be exploited as an attack surface. This paper presents Federated Knowledge Recycling (FedKR), a cross-silo federated learning approach that uses locally generated synthetic data to facilitate collaboration between institutions. FedKR combines advanced data generation techniques with a dynamic aggregation process to provide greater security against privacy attacks than existing methods, significantly reducing the attack surface. Experimental results on generic and medical datasets show that FedKR achieves competitive performance, with an average improvement in accuracy of 4.24% compared to training models from local data, demonstrating particular effectiveness in data scarcity scenarios.

Via

Access Paper or Ask Questions

Synthetic Image Learning: Preserving Performance and Preventing Membership Inference Attacks

Jul 22, 2024

Eugenio Lomurno, Matteo Matteucci

Figure 1 for Synthetic Image Learning: Preserving Performance and Preventing Membership Inference Attacks

Figure 2 for Synthetic Image Learning: Preserving Performance and Preventing Membership Inference Attacks

Figure 3 for Synthetic Image Learning: Preserving Performance and Preventing Membership Inference Attacks

Figure 4 for Synthetic Image Learning: Preserving Performance and Preventing Membership Inference Attacks

Abstract:Generative artificial intelligence has transformed the generation of synthetic data, providing innovative solutions to challenges like data scarcity and privacy, which are particularly critical in fields such as medicine. However, the effective use of this synthetic data to train high-performance models remains a significant challenge. This paper addresses this issue by introducing Knowledge Recycling (KR), a pipeline designed to optimise the generation and use of synthetic data for training downstream classifiers. At the heart of this pipeline is Generative Knowledge Distillation (GKD), the proposed technique that significantly improves the quality and usefulness of the information provided to classifiers through a synthetic dataset regeneration and soft labelling mechanism. The KR pipeline has been tested on a variety of datasets, with a focus on six highly heterogeneous medical image datasets, ranging from retinal images to organ scans. The results show a significant reduction in the performance gap between models trained on real and synthetic data, with models based on synthetic data outperforming those trained on real data in some cases. Furthermore, the resulting models show almost complete immunity to Membership Inference Attacks, manifesting privacy properties missing in models trained with conventional techniques.

Via

Access Paper or Ask Questions

Digital Twins of the EM Environment: Benchmark for Ray Launching Models

Jun 07, 2024

Michele Zhu, Lorenzo Cazzella, Francesco Linsalata, Maurizio Magarini, Matteo Matteucci, Umberto Spagnolini

Figure 1 for Digital Twins of the EM Environment: Benchmark for Ray Launching Models

Figure 2 for Digital Twins of the EM Environment: Benchmark for Ray Launching Models

Figure 3 for Digital Twins of the EM Environment: Benchmark for Ray Launching Models

Figure 4 for Digital Twins of the EM Environment: Benchmark for Ray Launching Models

Abstract:Digital Twin has emerged as a promising paradigm for accurately representing the electromagnetic (EM) wireless environments. The resulting virtual representation of the reality facilitates comprehensive insights into the propagation environment, empowering multi-layer decision-making processes at the physical communication level. This paper investigates the digitization of wireless communication propagation, with particular emphasis on the indispensable aspect of ray-based propagation simulation for real-time Digital Twins. A benchmark for ray-based propagation simulations is presented to evaluate computational time, with two urban scenarios characterized by different mesh complexity, single and multiple wireless link configurations, and simulations with/without diffuse scattering. Exhaustive empirical analyses are performed showing and comparing the behavior of different ray-based solutions. By offering standardized simulations and scenarios, this work provides a technical benchmark for practitioners involved in the implementation of real-time Digital Twins and optimization of ray-based propagation models.

Via

Access Paper or Ask Questions

Can CLIP help CLIP in learning 3D?

Jun 04, 2024

Cristian Sbrolli, Matteo Matteucci

Figure 1 for Can CLIP help CLIP in learning 3D?

Figure 2 for Can CLIP help CLIP in learning 3D?

Figure 3 for Can CLIP help CLIP in learning 3D?

Figure 4 for Can CLIP help CLIP in learning 3D?

Abstract:In this study, we explore an alternative approach to enhance contrastive text-image-3D alignment in the absence of textual descriptions for 3D objects. We introduce two unsupervised methods, $I2I$ and $(I2L)^2$, which leverage CLIP knowledge about textual and 2D data to compute the neural perceived similarity between two 3D samples. We employ the proposed methods to mine 3D hard negatives, establishing a multimodal contrastive pipeline with hard negative weighting via a custom loss function. We train on different configurations of the proposed hard negative mining approach, and we evaluate the accuracy of our models in 3D classification and on the cross-modal retrieval benchmark, testing image-to-shape and shape-to-image retrieval. Results demonstrate that our approach, even without explicit text alignment, achieves comparable or superior performance on zero-shot and standard 3D classification, while significantly improving both image-to-shape and shape-to-image retrieval compared to previous methods.

Via

Access Paper or Ask Questions

The Empirical Impact of Forgetting and Transfer in Continual Visual Odometry

Jun 03, 2024

Paolo Cudrano, Xiaoyu Luo, Matteo Matteucci

Abstract:As robotics continues to advance, the need for adaptive and continuously-learning embodied agents increases, particularly in the realm of assistance robotics. Quick adaptability and long-term information retention are essential to operate in dynamic environments typical of humans' everyday lives. A lifelong learning paradigm is thus required, but it is scarcely addressed by current robotics literature. This study empirically investigates the impact of catastrophic forgetting and the effectiveness of knowledge transfer in neural networks trained continuously in an embodied setting. We focus on the task of visual odometry, which holds primary importance for embodied agents in enabling their self-localization. We experiment on the simple continual scenario of discrete transitions between indoor locations, akin to a robot navigating different apartments. In this regime, we observe initial satisfactory performance with high transferability between environments, followed by a specialization phase where the model prioritizes current environment-specific knowledge at the expense of generalization. Conventional regularization strategies and increased model capacity prove ineffective in mitigating this phenomenon. Rehearsal is instead mildly beneficial but with the addition of a substantial memory cost. Incorporating action information, as commonly done in embodied settings, facilitates quicker convergence but exacerbates specialization, making the model overly reliant on its motion expectations and less adept at correctly interpreting visual cues. These findings emphasize the open challenges of balancing adaptation and memory retention in lifelong robotics and contribute valuable insights into the application of a lifelong paradigm on embodied agents.

* Accepted to CoLLAs 2024

Via

Access Paper or Ask Questions

A Lightweight Neural Architecture Search Model for Medical Image Classification

May 06, 2024

Lunchen Xie, Eugenio Lomurno, Matteo Gambella, Danilo Ardagna, Manuel Roveri, Matteo Matteucci, Qingjiang Shi

Figure 1 for A Lightweight Neural Architecture Search Model for Medical Image Classification

Figure 2 for A Lightweight Neural Architecture Search Model for Medical Image Classification

Abstract:Accurate classification of medical images is essential for modern diagnostics. Deep learning advancements led clinicians to increasingly use sophisticated models to make faster and more accurate decisions, sometimes replacing human judgment. However, model development is costly and repetitive. Neural Architecture Search (NAS) provides solutions by automating the design of deep learning architectures. This paper presents ZO-DARTS+, a differentiable NAS algorithm that improves search efficiency through a novel method of generating sparse probabilities by bi-level optimization. Experiments on five public medical datasets show that ZO-DARTS+ matches the accuracy of state-of-the-art solutions while reducing search times by up to three times.

Via

Access Paper or Ask Questions