Abstract:Privacy and regulatory barriers often hinder centralized machine learning solutions, particularly in sectors like healthcare where data cannot be freely shared. Federated learning has emerged as a powerful paradigm to address these concerns; however, existing frameworks primarily support gradient-based models, leaving a gap for more interpretable, tree-based approaches. This paper introduces a federated learning framework for Random Forest classifiers that preserves data privacy and provides robust performance in distributed settings. By leveraging PySyft for secure, privacy-aware computation, our method enables multiple institutions to collaboratively train Random Forest models on locally stored data without exposing sensitive information. The framework supports weighted model averaging to account for varying data distributions, incremental learning to progressively refine models, and local evaluation to assess performance across heterogeneous datasets. Experiments on two real-world healthcare benchmarks demonstrate that the federated approach maintains competitive predictive accuracy - within a maximum 9\% margin of centralized methods - while satisfying stringent privacy requirements. These findings underscore the viability of tree-based federated learning for scenarios where data cannot be centralized due to regulatory, competitive, or technical constraints. The proposed solution addresses a notable gap in existing federated learning libraries, offering an adaptable tool for secure distributed machine learning tasks that demand both transparency and reliable performance. The tool is available at https://github.com/ieeta-pt/fed_rf.
Abstract:The artistic community is increasingly relying on automatic computational analysis for authentication and classification of artistic paintings. In this paper, we identify hidden patterns and relationships present in artistic paintings by analysing their complexity, a measure that quantifies the sum of characteristics of an object. Specifically, we apply Normalized Compression (NC) and the Block Decomposition Method (BDM) to a dataset of 4,266 paintings from 91 authors and examine the potential of these information-based measures as descriptors of artistic paintings. Both measures consistently described the equivalent types of paintings, authors, and artistic movements. Moreover, combining the NC with a measure of the roughness of the paintings creates an efficient stylistic descriptor. Furthermore, by quantifying the local information of each painting, we define a fingerprint that describes critical information regarding the artists' style, their artistic influences, and shared techniques. More fundamentally, this information describes how each author typically composes and distributes the elements across the canvas and, therefore, how their work is perceived. Finally, we demonstrate that regional complexity and two-point height difference correlation function are useful auxiliary features that improve current methodologies in style and author classification of artistic paintings. The whole study is supported by an extensive website (http://panther.web.ua.pt) for fast author characterization and authentication.