Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sandro Fiore

yProv4ML: Effortless Provenance Tracking for Machine Learning Systems

Jul 01, 2025

Gabriele Padovani, Valentine Anantharaj, Sandro Fiore

Abstract:The rapid growth of interest in large language models (LLMs) reflects their potential for flexibility and generalization, and attracted the attention of a diverse range of researchers. However, the advent of these techniques has also brought to light the lack of transparency and rigor with which development is pursued. In particular, the inability to determine the number of epochs and other hyperparameters in advance presents challenges in identifying the best model. To address this challenge, machine learning frameworks such as MLFlow can automate the collection of this type of information. However, these tools capture data using proprietary formats and pose little attention to lineage. This paper proposes yProv4ML, a framework to capture provenance information generated during machine learning processes in PROV-JSON format, with minimal code modifications.

Via

Access Paper or Ask Questions

Improving Oil Slick Trajectory Simulations with Bayesian Optimization

Mar 04, 2025

Gabriele Accarino, Marco M. De Carlo, Igor Atake, Donatello Elia, Anusha L. Dissanayake, Antonio Augusto Sepp Neves, Juan Peña Ibañez, Italo Epicoco, Paola Nassisi, Sandro Fiore(+1 more)

Abstract:Accurate simulations of oil spill trajectories are essential for supporting practitioners' response and mitigating environmental and socioeconomic impacts. Numerical models, such as MEDSLIK-II, simulate advection, dispersion, and transformation processes of oil particles. However, simulations heavily rely on accurate parameter tuning, still based on expert knowledge and manual calibration. To overcome these limitations, we integrate the MEDSLIK-II numerical oil spill model with a Bayesian optimization framework to iteratively estimate the best physical parameter configuration that yields simulation closer to satellite observations of the slick. We focus on key parameters, such as horizontal diffusivity and drift factor, maximizing the Fraction Skill Score (FSS) as a measure of spatio-temporal overlap between simulated and observed oil distributions. We validate the framework for the Baniyas oil incident that occurred in Syria between August 23 and September 4, 2021, which released over 12,000 $m^3$ of oil. We show that, on average, the proposed approach systematically improves the FSS from 5.82% to 11.07% compared to control simulations initialized with default parameters. The optimization results in consistent improvement across multiple time steps, particularly during periods of increased drift variability, demonstrating the robustness of our method in dynamic environmental conditions.

* 29 pages, 10 figures, 3 tables, research paper

Via

Access Paper or Ask Questions