Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Athanasios Efthymiou

VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings

Mar 02, 2026

Athanasios Efthymiou, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring

Abstract:Real-world multimodal knowledge graphs (MKGs) are inherently heterogeneous, modeling entities that are associated with diverse modalities. Traditional knowledge graph embedding (KGE) methods excel at learning continuous representations of entities and relations, yet they are typically designed for unimodal settings. Recent approaches extend KGE to multimodal settings but remain constrained, often processing modalities in isolation, resulting in weak cross-modal alignment, and relying on simplistic assumptions such as uniform modality availability across entities. Vision-Language Models (VLMs) offer a powerful way to align diverse modalities within a shared embedding space. We propose Vision-Language Knowledge Graph Embeddings (VL-KGE), a framework that integrates cross-modal alignment from VLMs with structured relational modeling to learn unified multimodal representations of knowledge graphs. Experiments on WN9-IMG and two novel fine art MKGs, WikiArt-MKG-v1 and WikiArt-MKG-v2, demonstrate that VL-KGE consistently improves over traditional unimodal and multimodal KGE methods in link prediction tasks. Our results highlight the value of VLMs for multimodal KGE, enabling more robust and structured reasoning over large-scale heterogeneous knowledge graphs.

* In Proceedings of the ACM Web Conference 2026 (WWW '26)
* Published in Proceedings of the ACM Web Conference 2026 (WWW '26). This arXiv version includes extended supplementary material

Via

Access Paper or Ask Questions

Set2Seq Transformer: Learning Permutation Aware Set Representations of Artistic Sequences

Aug 06, 2024

Athanasios Efthymiou, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring

Figure 1 for Set2Seq Transformer: Learning Permutation Aware Set Representations of Artistic Sequences

Figure 2 for Set2Seq Transformer: Learning Permutation Aware Set Representations of Artistic Sequences

Figure 3 for Set2Seq Transformer: Learning Permutation Aware Set Representations of Artistic Sequences

Figure 4 for Set2Seq Transformer: Learning Permutation Aware Set Representations of Artistic Sequences

Abstract:We propose Set2Seq Transformer, a novel sequential multiple instance architecture, that learns to rank permutation aware set representations of sequences. First, we illustrate that learning temporal position-aware representations of discrete timesteps can greatly improve static visual multiple instance learning methods that do not regard temporality and concentrate almost exclusively on visual content analysis. We further demonstrate the significant advantages of end-to-end sequential multiple instance learning, integrating visual content and temporal information in a multimodal manner. As application we focus on fine art analysis related tasks. To that end, we show that our Set2Seq Transformer can leverage visual set and temporal position-aware representations for modelling visual artists' oeuvres for predicting artistic success. Finally, through extensive quantitative and qualitative evaluation using a novel dataset, WikiArt-Seq2Rank, and a visual learning-to-rank downstream task, we show that our Set2Seq Transformer captures essential temporal information improving the performance of strong static and sequential multiple instance learning methods for predicting artistic success.

Via

Access Paper or Ask Questions

Prototype-Enhanced Hypergraph Learning for Heterogeneous Information Networks

Sep 22, 2023

Shuai Wang, Jiayi Shen, Athanasios Efthymiou, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring

Abstract:The variety and complexity of relations in multimedia data lead to Heterogeneous Information Networks (HINs). Capturing the semantics from such networks requires approaches capable of utilizing the full richness of the HINs. Existing methods for modeling HINs employ techniques originally designed for graph neural networks, and HINs decomposition analysis, like using manually predefined metapaths. In this paper, we introduce a novel prototype-enhanced hypergraph learning approach for node classification in HINs. Using hypergraphs instead of graphs, our method captures higher-order relationships among nodes and extracts semantic information without relying on metapaths. Our method leverages the power of prototypes to improve the robustness of the hypergraph learning process and creates the potential to provide human-interpretable insights into the underlying network structure. Extensive experiments on three real-world HINs demonstrate the effectiveness of our method.

Via

Access Paper or Ask Questions

Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings

May 17, 2021

Athanasios Efthymiou, Stevan Rudinac, Monika Kackovic, Marcel Worring, Nachoem Wijnberg

Figure 1 for Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings

Figure 2 for Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings

Figure 3 for Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings

Figure 4 for Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings

Abstract:We propose ArtSAGENet, a novel multimodal architecture that integrates Graph Neural Networks (GNNs) and Convolutional Neural Networks (CNNs), to jointly learn visual and semantic-based artistic representations. First, we illustrate the significant advantages of multi-task learning for fine art analysis and argue that it is conceptually a much more appropriate setting in the fine art domain than the single-task alternatives. We further demonstrate that several GNN architectures can outperform strong CNN baselines in a range of fine art analysis tasks, such as style classification, artist attribution, creation period estimation, and tag prediction, while training them requires an order of magnitude less computational time and only a small amount of labeled data. Finally, through extensive experimentation we show that our proposed ArtSAGENet captures and encodes valuable relational dependencies between the artists and the artworks, surpassing the performance of traditional methods that rely solely on the analysis of visual content. Our findings underline a great potential of integrating visual content and semantics for fine art analysis and curation.

Via

Access Paper or Ask Questions