Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gabriele Padovani

yProv4DV: Reproducible Data Visualization Scripts Out of the Box

Mar 20, 2026

Gabriele Padovani, Sandro Fiore

Abstract:While results visualization is a critical phase to the communication of new academic results, plots are frequently shared without the complete combination of code, input data, execution context and outputs required to independently reproduce the resulting figures. Existing reproducibility solutions tend to focus on computational pipelines or workflow management systems, not covering script-based visualization practices commonly used by researchers and practitioners. Additionally, the minimalist nature of current Python data visualization libraries tend to speed up the creation of images, disincentivizing users from spending time integrating additional tools into these short scripts. This paper proposes yProv4DV, a library lightweight designed to enable reproducible data visualization scripts through the use of provenance information, minimizing the necessity for code modifications. Through a single call, users can track inputs, outputs and source code files, enabling saving and full reproducibility of their data visualization software. As a result, this library fills a gap in reproducible research workflows by addressing the reproducibility of plots in scientific publications.

* SoftwareX, 17 pages, 4 figures

Via

Access Paper or Ask Questions

yProv4ML: Effortless Provenance Tracking for Machine Learning Systems

Jul 01, 2025

Gabriele Padovani, Valentine Anantharaj, Sandro Fiore

Abstract:The rapid growth of interest in large language models (LLMs) reflects their potential for flexibility and generalization, and attracted the attention of a diverse range of researchers. However, the advent of these techniques has also brought to light the lack of transparency and rigor with which development is pursued. In particular, the inability to determine the number of epochs and other hyperparameters in advance presents challenges in identifying the best model. To address this challenge, machine learning frameworks such as MLFlow can automate the collection of this type of information. However, these tools capture data using proprietary formats and pose little attention to lineage. This paper proposes yProv4ML, a framework to capture provenance information generated during machine learning processes in PROV-JSON format, with minimal code modifications.

Via

Access Paper or Ask Questions