Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nachoem Wijnberg

A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding

Apr 21, 2026

Shuai Wang, Hongyi Zhu, Jia-Hong Huang, Yixian Shen, Chengxi Zeng, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring

Abstract:Understanding artworks requires multi-step reasoning over visual content and cultural, historical, and stylistic context. While recent multimodal large language models show promise in artwork explanation, they rely on implicit reasoning and internalized knowl- edge, limiting interpretability and explicit evidence grounding. We propose A-MAR, an Agent-based Multimodal Art Retrieval framework that explicitly conditions retrieval on structured reasoning plans. Given an artwork and a user query, A-MAR first decomposes the task into a structured reasoning plan that specifies the goals and evidence requirements for each step. Retrieval is then conditionedon this plan, enabling targeted evidence selection and supporting step-wise, grounded explanations. To evaluate agent-based multi- modal reasoning within the art domain, we introduce ArtCoT-QA. This diagnostic benchmark features multi-step reasoning chains for diverse art-related queries, enabling a granular analysis that extends beyond simple final answer accuracy. Experiments on SemArt and Artpedia show that A-MAR consistently outperforms static, non planned retrieval and strong MLLM baselines in final explanation quality, while evaluations on ArtCoT-QA further demonstrate its advantages in evidence grounding and multi-step reasoning ability. These results highlight the importance of reasoning-conditioned retrieval for knowledge-intensive multimodal understanding and position A-MAR as a step toward interpretable, goal-driven AI systems, with particular relevance to cultural industries. The code and data are available at: https://github.com/ShuaiWang97/A-MAR.

* ICMR 2026, ACM International Conference on Multimedia Retrieval

Via

Access Paper or Ask Questions

VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings

Mar 02, 2026

Athanasios Efthymiou, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring

Abstract:Real-world multimodal knowledge graphs (MKGs) are inherently heterogeneous, modeling entities that are associated with diverse modalities. Traditional knowledge graph embedding (KGE) methods excel at learning continuous representations of entities and relations, yet they are typically designed for unimodal settings. Recent approaches extend KGE to multimodal settings but remain constrained, often processing modalities in isolation, resulting in weak cross-modal alignment, and relying on simplistic assumptions such as uniform modality availability across entities. Vision-Language Models (VLMs) offer a powerful way to align diverse modalities within a shared embedding space. We propose Vision-Language Knowledge Graph Embeddings (VL-KGE), a framework that integrates cross-modal alignment from VLMs with structured relational modeling to learn unified multimodal representations of knowledge graphs. Experiments on WN9-IMG and two novel fine art MKGs, WikiArt-MKG-v1 and WikiArt-MKG-v2, demonstrate that VL-KGE consistently improves over traditional unimodal and multimodal KGE methods in link prediction tasks. Our results highlight the value of VLMs for multimodal KGE, enabling more robust and structured reasoning over large-scale heterogeneous knowledge graphs.

* In Proceedings of the ACM Web Conference 2026 (WWW '26)
* Published in Proceedings of the ACM Web Conference 2026 (WWW '26). This arXiv version includes extended supplementary material

Via

Access Paper or Ask Questions

ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding

May 09, 2025

Shuai Wang, Ivona Najdenkoska, Hongyi Zhu, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring

Figure 1 for ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding

Figure 2 for ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding

Figure 3 for ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding

Figure 4 for ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding

Abstract:Understanding visual art requires reasoning across multiple perspectives -- cultural, historical, and stylistic -- beyond mere object recognition. While recent multimodal large language models (MLLMs) perform well on general image captioning, they often fail to capture the nuanced interpretations that fine art demands. We propose ArtRAG, a novel, training-free framework that combines structured knowledge with retrieval-augmented generation (RAG) for multi-perspective artwork explanation. ArtRAG automatically constructs an Art Context Knowledge Graph (ACKG) from domain-specific textual sources, organizing entities such as artists, movements, themes, and historical events into a rich, interpretable graph. At inference time, a multi-granular structured retriever selects semantically and topologically relevant subgraphs to guide generation. This enables MLLMs to produce contextually grounded, culturally informed art descriptions. Experiments on the SemArt and Artpedia datasets show that ArtRAG outperforms several heavily trained baselines. Human evaluations further confirm that ArtRAG generates coherent, insightful, and culturally enriched interpretations.

Via

Access Paper or Ask Questions

Flexible categorization using formal concept analysis and Dempster-Shafer theory

Aug 23, 2024

Marcel Boersma, Krishna Manoorkar, Alessandra Palmigiano, Mattia Panettiere, Apostolos Tzimoulis, Nachoem Wijnberg

Figure 1 for Flexible categorization using formal concept analysis and Dempster-Shafer theory

Figure 2 for Flexible categorization using formal concept analysis and Dempster-Shafer theory

Figure 3 for Flexible categorization using formal concept analysis and Dempster-Shafer theory

Figure 4 for Flexible categorization using formal concept analysis and Dempster-Shafer theory

Abstract:Categorization of business processes is an important part of auditing. Large amounts of transactional data in auditing can be represented as transactions between financial accounts using weighted bipartite graphs. We view such bipartite graphs as many-valued formal contexts, which we use to obtain explainable categorization of these business processes in terms of financial accounts involved in a business process by using methods in formal concept analysis. We use Dempster-Shafer mass functions to represent agendas showing different interest in different set of financial accounts. We also model some possible deliberation scenarios between agents with different interrogative agendas to reach an aggregated agenda and categorization. The framework developed in this paper provides a formal ground to obtain and study explainable categorizations from the data represented as bipartite graphs according to the agendas of different agents in an organization (e.g. an audit firm), and interaction between these through deliberation. We use this framework to describe a machine-leaning meta algorithm for outlier detection and classification which can provide local and global explanations of its result and demonstrate it through an outlier detection algorithm.

* arXiv admin note: substantial text overlap with arXiv:2210.17330

Via

Access Paper or Ask Questions

Set2Seq Transformer: Learning Permutation Aware Set Representations of Artistic Sequences

Aug 06, 2024

Athanasios Efthymiou, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring

Figure 1 for Set2Seq Transformer: Learning Permutation Aware Set Representations of Artistic Sequences

Figure 2 for Set2Seq Transformer: Learning Permutation Aware Set Representations of Artistic Sequences

Figure 3 for Set2Seq Transformer: Learning Permutation Aware Set Representations of Artistic Sequences

Figure 4 for Set2Seq Transformer: Learning Permutation Aware Set Representations of Artistic Sequences

Abstract:We propose Set2Seq Transformer, a novel sequential multiple instance architecture, that learns to rank permutation aware set representations of sequences. First, we illustrate that learning temporal position-aware representations of discrete timesteps can greatly improve static visual multiple instance learning methods that do not regard temporality and concentrate almost exclusively on visual content analysis. We further demonstrate the significant advantages of end-to-end sequential multiple instance learning, integrating visual content and temporal information in a multimodal manner. As application we focus on fine art analysis related tasks. To that end, we show that our Set2Seq Transformer can leverage visual set and temporal position-aware representations for modelling visual artists' oeuvres for predicting artistic success. Finally, through extensive quantitative and qualitative evaluation using a novel dataset, WikiArt-Seq2Rank, and a visual learning-to-rank downstream task, we show that our Set2Seq Transformer captures essential temporal information improving the performance of strong static and sequential multiple instance learning methods for predicting artistic success.

Via

Access Paper or Ask Questions

Ada-HGNN: Adaptive Sampling for Scalable Hypergraph Neural Networks

May 22, 2024

Shuai Wang, David W. Zhang, Jia-Hong Huang, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring

Figure 1 for Ada-HGNN: Adaptive Sampling for Scalable Hypergraph Neural Networks

Figure 2 for Ada-HGNN: Adaptive Sampling for Scalable Hypergraph Neural Networks

Figure 3 for Ada-HGNN: Adaptive Sampling for Scalable Hypergraph Neural Networks

Figure 4 for Ada-HGNN: Adaptive Sampling for Scalable Hypergraph Neural Networks

Abstract:Hypergraphs serve as an effective model for depicting complex connections in various real-world scenarios, from social to biological networks. The development of Hypergraph Neural Networks (HGNNs) has emerged as a valuable method to manage the intricate associations in data, though scalability is a notable challenge due to memory limitations. In this study, we introduce a new adaptive sampling strategy specifically designed for hypergraphs, which tackles their unique complexities in an efficient manner. We also present a Random Hyperedge Augmentation (RHA) technique and an additional Multilayer Perceptron (MLP) module to improve the robustness and generalization capabilities of our approach. Thorough experiments with real-world datasets have proven the effectiveness of our method, markedly reducing computational and memory demands while maintaining performance levels akin to conventional HGNNs and other baseline models. This research paves the way for improving both the scalability and efficacy of HGNNs in extensive applications. We will also make our codebase publicly accessible.

Via

Access Paper or Ask Questions

Outlier detection using flexible categorisation and interrogative agendas

Dec 20, 2023

Marcel Boersma, Krishna Manoorkar, Alessandra Palmigiano, Mattia Panettiere, Apostolos Tzimoulis, Nachoem Wijnberg

Figure 1 for Outlier detection using flexible categorisation and interrogative agendas

Figure 2 for Outlier detection using flexible categorisation and interrogative agendas

Figure 3 for Outlier detection using flexible categorisation and interrogative agendas

Figure 4 for Outlier detection using flexible categorisation and interrogative agendas

Abstract:Categorization is one of the basic tasks in machine learning and data analysis. Building on formal concept analysis (FCA), the starting point of the present work is that different ways to categorize a given set of objects exist, which depend on the choice of the sets of features used to classify them, and different such sets of features may yield better or worse categorizations, relative to the task at hand. In their turn, the (a priori) choice of a particular set of features over another might be subjective and express a certain epistemic stance (e.g. interests, relevance, preferences) of an agent or a group of agents, namely, their interrogative agenda. In the present paper, we represent interrogative agendas as sets of features, and explore and compare different ways to categorize objects w.r.t. different sets of features (agendas). We first develop a simple unsupervised FCA-based algorithm for outlier detection which uses categorizations arising from different agendas. We then present a supervised meta-learning algorithm to learn suitable (fuzzy) agendas for categorization as sets of features with different weights or masses. We combine this meta-learning algorithm with the unsupervised outlier detection algorithm to obtain a supervised outlier detection algorithm. We show that these algorithms perform at par with commonly used algorithms for outlier detection on commonly used datasets in outlier detection. These algorithms provide both local and global explanations of their results.

Via

Access Paper or Ask Questions

Prototype-Enhanced Hypergraph Learning for Heterogeneous Information Networks

Sep 22, 2023

Shuai Wang, Jiayi Shen, Athanasios Efthymiou, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring

Abstract:The variety and complexity of relations in multimedia data lead to Heterogeneous Information Networks (HINs). Capturing the semantics from such networks requires approaches capable of utilizing the full richness of the HINs. Existing methods for modeling HINs employ techniques originally designed for graph neural networks, and HINs decomposition analysis, like using manually predefined metapaths. In this paper, we introduce a novel prototype-enhanced hypergraph learning approach for node classification in HINs. Using hypergraphs instead of graphs, our method captures higher-order relationships among nodes and extracts semantic information without relying on metapaths. Our method leverages the power of prototypes to improve the robustness of the hypergraph learning process and creates the potential to provide human-interpretable insights into the underlying network structure. Extensive experiments on three real-world HINs demonstrate the effectiveness of our method.

Via

Access Paper or Ask Questions

Flexible categorization for auditing using formal concept analysis and Dempster-Shafer theory

Oct 31, 2022

Marcel Boersma, Krishna Manoorkar, Alessandra Palmigiano, Mattia Panettiere, Apostolos Tzimoulis, Nachoem Wijnberg

Figure 1 for Flexible categorization for auditing using formal concept analysis and Dempster-Shafer theory

Figure 2 for Flexible categorization for auditing using formal concept analysis and Dempster-Shafer theory

Figure 3 for Flexible categorization for auditing using formal concept analysis and Dempster-Shafer theory

Figure 4 for Flexible categorization for auditing using formal concept analysis and Dempster-Shafer theory

Abstract:Categorization of business processes is an important part of auditing. Large amounts of transnational data in auditing can be represented as transactions between financial accounts using weighted bipartite graphs. We view such bipartite graphs as many-valued formal contexts, which we use to obtain explainable categorization of these business processes in terms of financial accounts involved in a business process by using methods in formal concept analysis. The specific explainability feature of the methodology introduced in the present paper provides several advantages over e.g.~non-explainable machine learning techniques, and in fact, it can be taken as a basis for the development of algorithms which perform the task of clustering on transparent and accountable principles. Here, we focus on obtaining and studying different ways to categorize according to different extents of interest in different financial accounts, or interrogative agendas, of various agents or sub-tasks in audit. We use Dempster-Shafer mass functions to represent agendas showing different interest in different set of financial accounts. We propose two new methods to obtain categorizations from these agendas. We also model some possible deliberation scenarios between agents with different interrogative agendas to reach an aggregated agenda and categorization. The framework developed in this paper provides a formal ground to obtain and study explainable categorizations from the data represented as bipartite graphs according to the agendas of different agents in an organization (e.g.~an audit firm), and interaction between these through deliberation.

Via

Access Paper or Ask Questions

Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings

May 17, 2021

Athanasios Efthymiou, Stevan Rudinac, Monika Kackovic, Marcel Worring, Nachoem Wijnberg

Figure 1 for Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings

Figure 2 for Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings

Figure 3 for Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings

Figure 4 for Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings

Abstract:We propose ArtSAGENet, a novel multimodal architecture that integrates Graph Neural Networks (GNNs) and Convolutional Neural Networks (CNNs), to jointly learn visual and semantic-based artistic representations. First, we illustrate the significant advantages of multi-task learning for fine art analysis and argue that it is conceptually a much more appropriate setting in the fine art domain than the single-task alternatives. We further demonstrate that several GNN architectures can outperform strong CNN baselines in a range of fine art analysis tasks, such as style classification, artist attribution, creation period estimation, and tag prediction, while training them requires an order of magnitude less computational time and only a small amount of labeled data. Finally, through extensive experimentation we show that our proposed ArtSAGENet captures and encodes valuable relational dependencies between the artists and the artworks, surpassing the performance of traditional methods that rely solely on the analysis of visual content. Our findings underline a great potential of integrating visual content and semantics for fine art analysis and curation.

Via

Access Paper or Ask Questions