Abstract:Data within a specific context gains deeper significance beyond its isolated interpretation. In distributed systems, interdependent data sources reveal hidden relationships and latent structures, representing valuable information for many applications. This paper introduces Osmotic Learning (OSM-L), a self-supervised distributed learning paradigm designed to uncover higher-level latent knowledge from distributed data. The core of OSM-L is osmosis, a process that synthesizes dense and compact representation by extracting contextual information, eliminating the need for raw data exchange between distributed entities. OSM-L iteratively aligns local data representations, enabling information diffusion and convergence into a dynamic equilibrium that captures contextual patterns. During training, it also identifies correlated data groups, functioning as a decentralized clustering mechanism. Experimental results confirm OSM-L's convergence and representation capabilities on structured datasets, achieving over 0.99 accuracy in local information alignment while preserving contextual integrity.
Abstract:Traditional Machine Learning (ML) methods require large amounts of data to perform well, limiting their applicability in sparse or incomplete scenarios and forcing the usage of additional synthetic data to improve the model training. To overcome this challenge, the research community is looking more and more at Graph Machine Learning (GML) as it offers a powerful alternative by using relationships within data. However, this method also faces limitations, particularly when dealing with Knowledge Graphs (KGs), which can hide huge information due to their semantic nature. This study introduces Bi-View, a novel hybrid approach that increases the informative content of node features in KGs to generate enhanced Graph Embeddings (GEs) that are used to improve GML models without relying on additional synthetic data. The proposed work combines two complementary GE techniques: Node2Vec, which captures structural patterns through unsupervised random walks, and GraphSAGE, which aggregates neighbourhood information in a supervised way. Node2Vec embeddings are first computed to represent the graph topology, and node features are then enriched with centrality-based metrics, which are used as input for the GraphSAGE model. Moreover, a fusion layer combines the original Node2Vec embeddings with the GraphSAGE-influenced representations, resulting in a dual-perspective embedding space. Such a fusion captures both topological and semantic properties of the graph, enabling the model to exploit informative features that may exist in the dataset but that are not explicitly represented. Our approach improves downstream task performance, especially in scenarios with poor initial features, giving the basis for more accurate and precise KG-enanched GML models.




Abstract:The rise of graph-structured data has driven major advances in Graph Machine Learning (GML), where graph embeddings (GEs) map features from Knowledge Graphs (KGs) into vector spaces, enabling tasks like node classification and link prediction. However, since GEs are derived from explicit topology and features, they may miss crucial implicit knowledge hidden in seemingly sparse datasets, affecting graph structure and their representation. We propose a GML pipeline that integrates a Knowledge Completion (KC) phase to uncover latent dataset semantics before embedding generation. Focusing on transitive relations, we model hidden connections with decay-based inference functions, reshaping graph topology, with consequences on embedding dynamics and aggregation processes in GraphSAGE and Node2Vec. Experiments show that our GML pipeline significantly alters the embedding space geometry, demonstrating that its introduction is not just a simple enrichment but a transformative step that redefines graph representation quality.
Abstract:Graph Machine Learning (GML) with Graph Databases (GDBs) has gained significant relevance in recent years, due to its ability to handle complex interconnected data and apply ML techniques using Graph Data Science (GDS). However, a critical gap exists in the current way GDB-GML applications analyze data, especially in terms of Knowledge Completion (KC) in Knowledge Graphs (KGs). In particular, current architectures ignore KC, working on datasets that appear incomplete or fragmented, despite they actually contain valuable hidden knowledge. This limitation may cause wrong interpretations when these data are used as input for GML models. This paper proposes an innovative architecture that integrates a KC phase into GDB-GML applications, demonstrating how revealing hidden knowledge can heavily impact datasets' behavior and metrics. For this purpose, we introduce scalable transitive relationships, which are links that propagate information over the network and modelled by a decay function, allowing a deterministic knowledge flows across multiple nodes. Experimental results demonstrate that our intuition radically reshapes both topology and overall dataset dynamics, underscoring the need for this new GDB-GML architecture to produce better models and unlock the full potential of graph-based data analysis.
Abstract:AI-generated synthetic media, also called Deepfakes, have significantly influenced so many domains, from entertainment to cybersecurity. Generative Adversarial Networks (GANs) and Diffusion Models (DMs) are the main frameworks used to create Deepfakes, producing highly realistic yet fabricated content. While these technologies open up new creative possibilities, they also bring substantial ethical and security risks due to their potential misuse. The rise of such advanced media has led to the development of a cognitive bias known as Impostor Bias, where individuals doubt the authenticity of multimedia due to the awareness of AI's capabilities. As a result, Deepfake detection has become a vital area of research, focusing on identifying subtle inconsistencies and artifacts with machine learning techniques, especially Convolutional Neural Networks (CNNs). Research in forensic Deepfake technology encompasses five main areas: detection, attribution and recognition, passive authentication, detection in realistic scenarios, and active authentication. Each area tackles specific challenges, from tracing the origins of synthetic media and examining its inherent characteristics for authenticity. This paper reviews the primary algorithms that address these challenges, examining their advantages, limitations, and future prospects.