Abstract:While graph-derived signals are widely used in tabular learning, existing studies typically rely on limited experimental setups and average performance comparisons, leaving the statistical reliability and robustness of observed gains largely unexplored. Consequently, it remains unclear which signals provide consistent and robust improvements. This paper presents a taxonomy-driven empirical analysis of graph-derived signals for tabular machine learning. We propose a unified and reproducible evaluation protocol to systematically assess which categories of graph-derived signals yield statistically significant and robust performance improvements. The protocol provides an extensible setup for the controlled integration of diverse graph-derived signals into tabular learning pipelines. To ensure a fair and rigorous comparison, it incorporates automated hyperparameter optimization, multi-seed statistical evaluation, formal significance testing, and robustness analysis under graph perturbations. We demonstrate the protocol through an extensive case study on a large-scale, imbalanced cryptocurrency fraud detection dataset. The analysis identifies signal categories providing consistently reliable performance gains and offers interpretable insights into which graph-derived signals indicate fraud-discriminative structural patterns. Furthermore, robustness analyses reveal pronounced differences in how various signals handle missing or corrupted relational data. These findings demonstrate practical utility for fraud detection and illustrate how the proposed taxonomy-driven evaluation protocol can be applied in other application domains.
Abstract:Large language models (LLMs) show promise for supporting clinical decision-making in complex fields such as rheumatology. Our evaluation shows that smaller language models (SLMs), combined with retrieval-augmented generation (RAG), achieve higher diagnostic and therapeutic performance than larger models, while requiring substantially less energy and enabling cost-efficient, local deployment. These features are attractive for resource-limited healthcare. However, expert oversight remains essential, as no model consistently reached specialist-level accuracy in rheumatology.
Abstract:Biased news reporting poses a significant threat to informed decision-making and the functioning of democracies. This study introduces a novel methodology for scalable, minimally biased analysis of media bias in political news. The proposed approach examines event selection, labeling, word choice, and commission and omission biases across news sources by leveraging natural language processing techniques, including hierarchical topic modeling, sentiment analysis, and ontology learning with large language models. Through three case studies related to current political events, we demonstrate the methodology's effectiveness in identifying biases across news sources at various levels of granularity. This work represents a significant step towards scalable, minimally biased media bias analysis, laying the groundwork for tools to help news consumers navigate an increasingly complex media landscape.
Abstract:Node embedding refers to techniques that generate low-dimensional vector representations of nodes in a graph while preserving specific properties of the nodes. A key challenge in the field is developing scalable methods that can preserve structural properties suitable for the required types of structural patterns of a given downstream application task. While most existing methods focus on preserving node proximity, those that do preserve structural properties often lack the flexibility to preserve various types of structural patterns required by downstream application tasks. This paper introduces ffstruc2vec, a scalable deep-learning framework for learning node embedding vectors that preserve structural identities. Its flat, efficient architecture allows high flexibility in capturing diverse types of structural patterns, enabling broad adaptability to various downstream application tasks. The proposed framework significantly outperforms existing approaches across diverse unsupervised and supervised tasks in practical applications. Moreover, ffstruc2vec enables explainability by quantifying how individual structural patterns influence task outcomes, providing actionable interpretation. To our knowledge, no existing framework combines this level of flexibility, scalability, and structural interpretability, underscoring its unique capabilities.