Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Álvaro López García

Privacy Preserving Machine Learning Workflow: from Anonymization to Personalized Differential Privacy Budgets in Federated Learning

May 04, 2026

Judith Sáinz-Pardo Díaz, Álvaro López García

Abstract:The growing development of artificial intelligence based solutions, together with privacy legislation, has driven the rise of the so-called privacy preserving machine learning architectures, such as federated learning. While federated learning enables model training on decentralized data preventing their sharing and centralization, it still faces several challenges related to data integrity and privacy. This paper presents a comprehensive privacy preserving federated learning workflow for sensitive tabular data, including anonymization and differential privacy techniques. We also introduce a formal definition for the concept of client drift, together with ways of detecting it to mitigate poisoning attacks. Then, we detail a complete methodology for assigning personalized privacy budgets for global differential privacy to the different clients participating in the network, based on a re-identification risk metric. The proposed methodology is presented and tested on an openly available dataset of medical records. Within the experimental setup we show that the approach based on personalized budgets, compared to the architecture including global differential privacy with fixed privacy budget, achieves a better model performance in terms of two error metrics.

* Accepted at the 2nd International Conference on Federated Learning and Intelligent Computing Systems (FLICS2026)

Via

Access Paper or Ask Questions

AI4EOSC: a Federated Cloud Platform for Artificial Intelligence in Scientific Research

Dec 18, 2025

Ignacio Heredia, Álvaro López García, Germán Moltó, Amanda Calatrava, Valentin Kozlov, Alessandro Costantini, Viet Tran, Mario David, Daniel San Martín, Marcin Płóciennik(+22 more)

Figure 1 for AI4EOSC: a Federated Cloud Platform for Artificial Intelligence in Scientific Research

Figure 2 for AI4EOSC: a Federated Cloud Platform for Artificial Intelligence in Scientific Research

Abstract:In this paper, we describe a federated compute platform dedicated to support Artificial Intelligence in scientific workloads. Putting the effort into reproducible deployments, it delivers consistent, transparent access to a federation of physically distributed e-Infrastructures. Through a comprehensive service catalogue, the platform is able to offer an integrated user experience covering the full Machine Learning lifecycle, including model development (with dedicated interactive development environments), training (with GPU resources, annotation tools, experiment tracking, and federated learning support) and deployment (covering a wide range of deployment options all along the Cloud Continuum). The platform also provides tools for traceability and reproducibility of AI models, integrates with different Artificial Intelligence model providers, datasets and storage resources, allowing users to interact with the broader Machine Learning ecosystem. Finally, it is easily customizable to lower the adoption barrier by external communities.

Via

Access Paper or Ask Questions

Metric Privacy in Federated Learning for Medical Imaging: Improving Convergence and Preventing Client Inference Attacks

Feb 03, 2025

Judith Sáinz-Pardo Díaz, Andreas Athanasiou, Kangsoo Jung, Catuscia Palamidessi, Álvaro López García

Figure 1 for Metric Privacy in Federated Learning for Medical Imaging: Improving Convergence and Preventing Client Inference Attacks

Figure 2 for Metric Privacy in Federated Learning for Medical Imaging: Improving Convergence and Preventing Client Inference Attacks

Figure 3 for Metric Privacy in Federated Learning for Medical Imaging: Improving Convergence and Preventing Client Inference Attacks

Figure 4 for Metric Privacy in Federated Learning for Medical Imaging: Improving Convergence and Preventing Client Inference Attacks

Abstract:Federated learning is a distributed learning technique that allows training a global model with the participation of different data owners without the need to share raw data. This architecture is orchestrated by a central server that aggregates the local models from the clients. This server may be trusted, but not all nodes in the network. Then, differential privacy (DP) can be used to privatize the global model by adding noise. However, this may affect convergence across the rounds of the federated architecture, depending also on the aggregation strategy employed. In this work, we aim to introduce the notion of metric-privacy to mitigate the impact of classical server side global-DP on the convergence of the aggregated model. Metric-privacy is a relaxation of DP, suitable for domains provided with a notion of distance. We apply it from the server side by computing a distance for the difference between the local models. We compare our approach with standard DP by analyzing the impact on six classical aggregation strategies. The proposed methodology is applied to an example of medical imaging and different scenarios are simulated across homogeneous and non-i.i.d clients. Finally, we introduce a novel client inference attack, where a semi-honest client tries to find whether another client participated in the training and study how it can be mitigated using DP and metric-privacy. Our evaluation shows that metric-privacy can increase the performance of the model compared to standard DP, while offering similar protection against client inference attacks.

Via

Access Paper or Ask Questions

Enhancing the Convergence of Federated Learning Aggregation Strategies with Limited Data

Jan 27, 2025

Judith Sáinz-Pardo Díaz, Álvaro López García

Figure 1 for Enhancing the Convergence of Federated Learning Aggregation Strategies with Limited Data

Figure 2 for Enhancing the Convergence of Federated Learning Aggregation Strategies with Limited Data

Figure 3 for Enhancing the Convergence of Federated Learning Aggregation Strategies with Limited Data

Figure 4 for Enhancing the Convergence of Federated Learning Aggregation Strategies with Limited Data

Abstract:The development of deep learning techniques is a leading field applied to cases in which medical data is used, particularly in cases of image diagnosis. This type of data has privacy and legal restrictions that in many cases prevent it from being processed from central servers. However, in this area collaboration between different research centers, in order to create models as robust as possible, trained with the largest quantity and diversity of data available, is a critical point to be taken into account. In this sense, the application of privacy aware distributed architectures, such as federated learning arises. When applying this type of architecture, the server aggregates the different local models trained with the data of each data owner to build a global model. This point is critical and therefore it is fundamental to analyze different ways of aggregation according to the use case, taking into account the distribution of the clients, the characteristics of the model, etc. In this paper we propose a novel aggregation strategy and we apply it to a use case of cerebral magnetic resonance image classification. In this use case the aggregation function proposed manages to improve the convergence obtained over the rounds of the federated learning process in relation to different aggregation strategies classically implemented and applied.

Via

Access Paper or Ask Questions

Personalized Federated Learning for improving radar based precipitation nowcasting on heterogeneous areas

Aug 11, 2024

Judith Sáinz-Pardo Díaz, María Castrillo, Juraj Bartok, Ignacio Heredia Cachá, Irina Malkin Ondík, Ivan Martynovskyi, Khadijeh Alibabaei, Lisana Berberi, Valentin Kozlov, Álvaro López García

Abstract:The increasing generation of data in different areas of life, such as the environment, highlights the need to explore new techniques for processing and exploiting data for useful purposes. In this context, artificial intelligence techniques, especially through deep learning models, are key tools to be used on the large amount of data that can be obtained, for example, from weather radars. In many cases, the information collected by these radars is not open, or belongs to different institutions, thus needing to deal with the distributed nature of this data. In this work, the applicability of a personalized federated learning architecture, which has been called adapFL, on distributed weather radar images is addressed. To this end, given a single available radar covering 400 km in diameter, the captured images are divided in such a way that they are disjointly distributed into four different federated clients. The results obtained with adapFL are analyzed in each zone, as well as in a central area covering part of the surface of each of the previously distributed areas. The ultimate goal of this work is to study the generalization capability of this type of learning technique for its extrapolation to use cases in which a representative number of radars is available, whose data can not be centralized due to technical, legal or administrative concerns. The results of this preliminary study indicate that the performance obtained in each zone with the adapFL approach allows improving the results of the federated learning approach, the individual deep learning models and the classical Continuity Tracking Radar Echoes by Correlation approach.

* Accepted for publication in Earth Science Informatics

Via

Access Paper or Ask Questions

Comparison of machine learning models applied on anonymized data with different techniques

May 12, 2023

Judith Sáinz-Pardo Díaz, Álvaro López García

Figure 1 for Comparison of machine learning models applied on anonymized data with different techniques

Figure 2 for Comparison of machine learning models applied on anonymized data with different techniques

Figure 3 for Comparison of machine learning models applied on anonymized data with different techniques

Figure 4 for Comparison of machine learning models applied on anonymized data with different techniques

Abstract:Anonymization techniques based on obfuscating the quasi-identifiers by means of value generalization hierarchies are widely used to achieve preset levels of privacy. To prevent different types of attacks against database privacy it is necessary to apply several anonymization techniques beyond the classical k-anonymity or $\ell$-diversity. However, the application of these methods is directly connected to a reduction of their utility in prediction and decision making tasks. In this work we study four classical machine learning methods currently used for classification purposes in order to analyze the results as a function of the anonymization techniques applied and the parameters selected for each of them. The performance of these models is studied when varying the value of k for k-anonymity and additional tools such as $\ell$-diversity, t-closeness and $\delta$-disclosure privacy are also deployed on the well-known adult dataset.

* Accepted for publication: IEEE International Conference in Cyber Security and Resilience 2023 (IEEE CSR)

Via

Access Paper or Ask Questions

Frouros: A Python library for drift detection in Machine Learning problems

Aug 14, 2022

Jaime Céspedes Sisniega, Álvaro López García

Figure 1 for Frouros: A Python library for drift detection in Machine Learning problems

Abstract:Frouros is a Python library capable of detecting drift in machine learning problems. It provides a combination of classical and more recent algorithms for drift detection: both supervised and unsupervised, as well as some capable of acting in a semi-supervised manner. We have designed it with the objective of being easily integrated with the scikit-learn library, implementing the same application programming interface. The library is developed following a set of best development and continuous integration practices to ensure ease of maintenance and extensibility. The source code is available at https://github.com/IFCA/frouros.

Via

Access Paper or Ask Questions

Study of the performance and scalability of federated learning for medical imaging with intermittent clients

Jul 19, 2022

Judith Sáinz-Pardo Díaz, Álvaro López García

Figure 1 for Study of the performance and scalability of federated learning for medical imaging with intermittent clients

Figure 2 for Study of the performance and scalability of federated learning for medical imaging with intermittent clients

Figure 3 for Study of the performance and scalability of federated learning for medical imaging with intermittent clients

Figure 4 for Study of the performance and scalability of federated learning for medical imaging with intermittent clients

Abstract:Federated learning is a data decentralization privacy-preserving technique used to perform machine or deep learning in a secure way. In this paper we present theoretical aspects about federated learning, such as the presentation of an aggregation operator, different types of federated learning, and issues to be taken into account in relation to the distribution of data from the clients, together with the exhaustive analysis of a use case where the number of clients varies. Specifically, a use case of medical image analysis is proposed, using chest X-ray images obtained from an open data repository. In addition to the advantages related to privacy, improvements in predictions (in terms of accuracy and area under the curve) and reduction of execution times will be studied with respect to the classical case (the centralized approach). Different clients will be simulated from the training data, selected in an unbalanced manner, i.e., they do not all have the same number of data. The results of considering three or ten clients are exposed and compared between them and against the centralized case. Two approaches to follow will be analyzed in the case of intermittent clients, as in a real scenario some clients may leave the training, and some new ones may enter the training. The evolution of the results for the test set in terms of accuracy, area under the curve and execution time is shown as the number of clients into which the original data is divided increases. Finally, improvements and future work in the field are proposed.

Via

Access Paper or Ask Questions

Forecasting COVID-19 spreading trough an ensemble of classical and machine learning models: Spain's case study

Jul 12, 2022

Ignacio Heredia Cacha, Judith Sainz-Pardo Díaz, María Castrillo Melguizo, Álvaro López García

Figure 1 for Forecasting COVID-19 spreading trough an ensemble of classical and machine learning models: Spain's case study

Figure 2 for Forecasting COVID-19 spreading trough an ensemble of classical and machine learning models: Spain's case study

Figure 3 for Forecasting COVID-19 spreading trough an ensemble of classical and machine learning models: Spain's case study

Figure 4 for Forecasting COVID-19 spreading trough an ensemble of classical and machine learning models: Spain's case study

Abstract:In this work we evaluate the applicability of an ensemble of population models and machine learning models to predict the near future evolution of the COVID-19 pandemic, with a particular use case in Spain. We rely solely in open and public datasets, fusing incidence, vaccination, human mobility and weather data to feed our machine learning models (Random Forest, Gradient Boosting, k-Nearest Neighbours and Kernel Ridge Regression). We use the incidence data to adjust classic population models (Gompertz, Logistic, Richards, Bertalanffy) in order to be able to better capture the trend of the data. We then ensemble these two families of models in order to obtain a more robust and accurate prediction. Furthermore, we have observed an improvement in the predictions obtained with machine learning models as we add new features (vaccines, mobility, climatic conditions), analyzing the importance of each of them using Shapley Additive Explanation values. As in any other modelling work, data and predictions quality have several limitations and therefore they must be seen from a critical standpoint, as we discuss in the text. Our work concludes that the ensemble use of these models improves the individual predictions (using only machine learning models or only population models) and can be applied, with caution, in cases when compartmental models cannot be utilized due to the lack of relevant data.

Via

Access Paper or Ask Questions

Estimation of high frequency nutrient concentrations from water quality surrogates using machine learning methods

Jan 27, 2020

María Castrillo, Álvaro López García

Figure 1 for Estimation of high frequency nutrient concentrations from water quality surrogates using machine learning methods

Figure 2 for Estimation of high frequency nutrient concentrations from water quality surrogates using machine learning methods

Figure 3 for Estimation of high frequency nutrient concentrations from water quality surrogates using machine learning methods

Figure 4 for Estimation of high frequency nutrient concentrations from water quality surrogates using machine learning methods

Abstract:Continuous high frequency water quality monitoring is becoming a critical task to support water management. Despite the advancements in sensor technologies, certain variables cannot be easily and/or economically monitored in-situ and in real time. In these cases, surrogate measures can be used to make estimations by means of data-driven models. In this work, variables that are commonly measured in-situ are used as surrogates to estimate the concentrations of nutrients in a rural catchment and in an urban one, making use of machine learning models, specifically Random Forests. The results are compared with those of linear modelling using the same number of surrogates, obtaining a reduction in the Root Mean Squared Error (RMSE) of up to 60.1%. The profit from including up to seven surrogate sensors was computed, concluding that adding more than 4 and 5 sensors in each of the catchments respectively was not worthy in terms of error improvement.

* Water Research. Volume 172, 1 April 2020, 115490

Via

Access Paper or Ask Questions