Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Katsuhiko Hayashi

Artwork Explanation in Large-scale Vision Language Models

Feb 29, 2024
Kazuki Hayashi, Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe

Figure 1 for Artwork Explanation in Large-scale Vision Language Models

Figure 2 for Artwork Explanation in Large-scale Vision Language Models

Figure 3 for Artwork Explanation in Large-scale Vision Language Models

Figure 4 for Artwork Explanation in Large-scale Vision Language Models

Large-scale vision-language models (LVLMs) output text from images and instructions, demonstrating advanced capabilities in text generation and comprehension. However, it has not been clarified to what extent LVLMs understand the knowledge necessary for explaining images, the complex relationships between various pieces of knowledge, and how they integrate these understandings into their explanations. To address this issue, we propose a new task: the artwork explanation generation task, along with its evaluation dataset and metric for quantitatively assessing the understanding and utilization of knowledge about artworks. This task is apt for image description based on the premise that LVLMs are expected to have pre-existing knowledge of artworks, which are often subjects of wide recognition and documented information. It consists of two parts: generating explanations from both images and titles of artworks, and generating explanations using only images, thus evaluating the LVLMs' language-based and vision-based knowledge. Alongside, we release a training dataset for LVLMs to learn explanations that incorporate knowledge about artworks. Our findings indicate that LVLMs not only struggle with integrating language and visual information but also exhibit a more pronounced limitation in acquiring knowledge from images alone. The datasets (ExpArt=Explain Artworks) are available at https://huggingface.co/datasets/naist-nlp/ExpArt.

Via

Access Paper or Ask Questions

Evaluating Image Review Ability of Vision Language Models

Feb 19, 2024
Shigeki Saito, Kazuki Hayashi, Yusuke Ide, Yusuke Sakai, Kazuma Onishi, Toma Suzuki, Seiji Gobara, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe

Large-scale vision language models (LVLMs) are language models that are capable of processing images and text inputs by a single model. This paper explores the use of LVLMs to generate review texts for images. The ability of LVLMs to review images is not fully understood, highlighting the need for a methodical evaluation of their review abilities. Unlike image captions, review texts can be written from various perspectives such as image composition and exposure. This diversity of review perspectives makes it difficult to uniquely determine a single correct review for an image. To address this challenge, we introduce an evaluation method based on rank correlation analysis, in which review texts are ranked by humans and LVLMs, then, measures the correlation between these rankings. We further validate this approach by creating a benchmark dataset aimed at assessing the image review ability of recent LVLMs. Our experiments with the dataset reveal that LVLMs, particularly those with proven superiority in other evaluative contexts, excel at distinguishing between high-quality and substandard image reviews.

* 9pages, under reviewing

Via

Access Paper or Ask Questions

Does Pre-trained Language Model Actually Infer Unseen Links in Knowledge Graph Completion?

Nov 15, 2023
Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe

Knowledge graphs (KGs) consist of links that describe relationships between entities. Due to the difficulty of manually enumerating all relationships between entities, automatically completing them is essential for KGs. Knowledge Graph Completion (KGC) is a task that infers unseen relationships between entities in a KG. Traditional embedding-based KGC methods, such as RESCAL, TransE, DistMult, ComplEx, RotatE, HAKE, HousE, etc., infer missing links using only the knowledge from training data. In contrast, the recent Pre-trained Language Model (PLM)-based KGC utilizes knowledge obtained during pre-training. Therefore, PLM-based KGC can estimate missing links between entities by reusing memorized knowledge from pre-training without inference. This approach is problematic because building KGC models aims to infer unseen links between entities. However, conventional evaluations in KGC do not consider inference and memorization abilities separately. Thus, a PLM-based KGC method, which achieves high performance in current KGC evaluations, may be ineffective in practical applications. To address this issue, we analyze whether PLM-based KGC methods make inferences or merely access memorized knowledge. For this purpose, we propose a method for constructing synthetic datasets specified in this analysis and conclude that PLMs acquire the inference abilities required for KGC through pre-training, even though the performance improvements mostly come from textual information of entities and relations.

* 15 pages, 10 figures

Via

Access Paper or Ask Questions

Model-based Subsampling for Knowledge Graph Completion

Sep 17, 2023
Xincan Feng, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe

Figure 1 for Model-based Subsampling for Knowledge Graph Completion

Figure 2 for Model-based Subsampling for Knowledge Graph Completion

Figure 3 for Model-based Subsampling for Knowledge Graph Completion

Figure 4 for Model-based Subsampling for Knowledge Graph Completion

Subsampling is effective in Knowledge Graph Embedding (KGE) for reducing overfitting caused by the sparsity in Knowledge Graph (KG) datasets. However, current subsampling approaches consider only frequencies of queries that consist of entities and their relations. Thus, the existing subsampling potentially underestimates the appearance probabilities of infrequent queries even if the frequencies of their entities or relations are high. To address this problem, we propose Model-based Subsampling (MBS) and Mixed Subsampling (MIX) to estimate their appearance probabilities through predictions of KGE models. Evaluation results on datasets FB15k-237, WN18RR, and YAGO3-10 showed that our proposed subsampling methods actually improved the KG completion performances for popular KGE models, RotatE, TransE, HAKE, ComplEx, and DistMult.

* Accepted by AACL 2023; 9 pages, 3 figures, 5 tables

Via

Access Paper or Ask Questions

Implicit ZCA Whitening Effects of Linear Autoencoders for Recommendation

Aug 15, 2023
Katsuhiko Hayashi, Kazuma Onishi

Figure 1 for Implicit ZCA Whitening Effects of Linear Autoencoders for Recommendation

Recently, in the field of recommendation systems, linear regression (autoencoder) models have been investigated as a way to learn item similarity. In this paper, we show a connection between a linear autoencoder model and ZCA whitening for recommendation data. In particular, we show that the dual form solution of a linear autoencoder model actually has ZCA whitening effects on feature vectors of items, while items are considered as input features in the primal problem of the autoencoder/regression model. We also show the correctness of applying a linear autoencoder to low-dimensional item vectors obtained using embedding methods such as Item2vec to estimate item-item similarities. Our experiments provide preliminary results indicating the effectiveness of whitening low-dimensional item embeddings.

Via

Access Paper or Ask Questions

Using Wikipedia Editor Information to Build High-performance Recommender Systems

Jun 14, 2023
Katsuhiko Hayashi

Figure 1 for Using Wikipedia Editor Information to Build High-performance Recommender Systems

Figure 2 for Using Wikipedia Editor Information to Build High-performance Recommender Systems

Figure 3 for Using Wikipedia Editor Information to Build High-performance Recommender Systems

Wikipedia has high-quality articles on a variety of topics and has been used in diverse research areas. In this study, a method is presented for using Wikipedia's editor information to build recommender systems in various domains that outperform content-based systems.

* Accepted at Wiki Workshop2023 (withdrawn by the author)

Via

Access Paper or Ask Questions

Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models

Jun 03, 2023
Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe

Figure 1 for Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models

Figure 2 for Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models

Figure 3 for Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models

Figure 4 for Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models

In this paper, we propose a table and image generation task to verify how the knowledge about entities acquired from natural language is retained in Vision & Language (V & L) models. This task consists of two parts: the first is to generate a table containing knowledge about an entity and its related image, and the second is to generate an image from an entity with a caption and a table containing related knowledge of the entity. In both tasks, the model must know the entities used to perform the generation properly. We created the Wikipedia Table and Image Generation (WikiTIG) dataset from about 200,000 infoboxes in English Wikipedia articles to perform the proposed tasks. We evaluated the performance on the tasks with respect to the above research question using the V & L model OFA, which has achieved state-of-the-art results in multiple tasks. Experimental results show that OFA forgets part of its entity knowledge by pre-training as a complement to improve the performance of image related tasks.

* Accepted at ACL 2023

Via

Access Paper or Ask Questions

Subsampling for Knowledge Graph Embedding Explained

Sep 13, 2022
Hidetaka Kamigaito, Katsuhiko Hayashi

Figure 1 for Subsampling for Knowledge Graph Embedding Explained

In this article, we explain the recent advance of subsampling methods in knowledge graph embedding (KGE) starting from the original one used in word2vec.

* Notes for subsampling methods in Knowledge Graph Embedding

Via

Access Paper or Ask Questions

Comprehensive Analysis of Negative Sampling in Knowledge Graph Representation Learning

Jul 07, 2022
Hidetaka Kamigaito, Katsuhiko Hayashi

Figure 1 for Comprehensive Analysis of Negative Sampling in Knowledge Graph Representation Learning

Figure 2 for Comprehensive Analysis of Negative Sampling in Knowledge Graph Representation Learning

Figure 3 for Comprehensive Analysis of Negative Sampling in Knowledge Graph Representation Learning

Figure 4 for Comprehensive Analysis of Negative Sampling in Knowledge Graph Representation Learning

Negative sampling (NS) loss plays an important role in learning knowledge graph embedding (KGE) to handle a huge number of entities. However, the performance of KGE degrades without hyperparameters such as the margin term and number of negative samples in NS loss being appropriately selected. Currently, empirical hyperparameter tuning addresses this problem at the cost of computational time. To solve this problem, we theoretically analyzed NS loss to assist hyperparameter tuning and understand the better use of the NS loss in KGE learning. Our theoretical analysis showed that scoring methods with restricted value ranges, such as TransE and RotatE, require appropriate adjustment of the margin term or the number of negative samples different from those without restricted value ranges, such as RESCAL, ComplEx, and DistMult. We also propose subsampling methods specialized for the NS loss in KGE studied from a theoretical aspect. Our empirical analysis on the FB15k-237, WN18RR, and YAGO3-10 datasets showed that the results of actually trained models agree with our theoretical findings.

* Accepted at ICML2022

Via

Access Paper or Ask Questions