Picture for Kiyoharu Aizawa

Kiyoharu Aizawa

MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding

Add code
May 26, 2025
Viaarxiv icon

Harnessing PDF Data for Improving Japanese Large Multimodal Models

Add code
Feb 20, 2025
Viaarxiv icon

A Benchmark and Evaluation for Real-World Out-of-Distribution Detection Using Vision-Language Models

Add code
Jan 30, 2025
Figure 1 for A Benchmark and Evaluation for Real-World Out-of-Distribution Detection Using Vision-Language Models
Figure 2 for A Benchmark and Evaluation for Real-World Out-of-Distribution Detection Using Vision-Language Models
Figure 3 for A Benchmark and Evaluation for Real-World Out-of-Distribution Detection Using Vision-Language Models
Figure 4 for A Benchmark and Evaluation for Real-World Out-of-Distribution Detection Using Vision-Language Models
Viaarxiv icon

JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation

Add code
Oct 22, 2024
Figure 1 for JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation
Figure 2 for JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation
Figure 3 for JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation
Figure 4 for JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation
Viaarxiv icon

FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation

Add code
Sep 27, 2024
Figure 1 for FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation
Figure 2 for FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation
Figure 3 for FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation
Figure 4 for FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation
Viaarxiv icon

Training-Free Sketch-Guided Diffusion with Latent Optimization

Add code
Aug 31, 2024
Figure 1 for Training-Free Sketch-Guided Diffusion with Latent Optimization
Figure 2 for Training-Free Sketch-Guided Diffusion with Latent Optimization
Figure 3 for Training-Free Sketch-Guided Diffusion with Latent Optimization
Figure 4 for Training-Free Sketch-Guided Diffusion with Latent Optimization
Viaarxiv icon

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

Add code
Jul 31, 2024
Figure 1 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Figure 2 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Figure 3 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Figure 4 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Viaarxiv icon

MangaUB: A Manga Understanding Benchmark for Large Multimodal Models

Add code
Jul 26, 2024
Viaarxiv icon

Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion

Add code
Apr 24, 2024
Viaarxiv icon

Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

Add code
Mar 29, 2024
Viaarxiv icon