Picture for Kiyoharu Aizawa

Kiyoharu Aizawa

FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation

Add code
Sep 27, 2024
Figure 1 for FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation
Figure 2 for FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation
Figure 3 for FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation
Figure 4 for FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation
Viaarxiv icon

Training-Free Sketch-Guided Diffusion with Latent Optimization

Add code
Aug 31, 2024
Figure 1 for Training-Free Sketch-Guided Diffusion with Latent Optimization
Figure 2 for Training-Free Sketch-Guided Diffusion with Latent Optimization
Figure 3 for Training-Free Sketch-Guided Diffusion with Latent Optimization
Figure 4 for Training-Free Sketch-Guided Diffusion with Latent Optimization
Viaarxiv icon

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

Add code
Jul 31, 2024
Figure 1 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Figure 2 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Figure 3 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Figure 4 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Viaarxiv icon

MangaUB: A Manga Understanding Benchmark for Large Multimodal Models

Add code
Jul 26, 2024
Viaarxiv icon

Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion

Add code
Apr 24, 2024
Figure 1 for Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion
Figure 2 for Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion
Figure 3 for Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion
Figure 4 for Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion
Viaarxiv icon

Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

Add code
Mar 29, 2024
Viaarxiv icon

Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes

Add code
Mar 24, 2024
Figure 1 for Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes
Figure 2 for Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes
Figure 3 for Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes
Figure 4 for Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes
Viaarxiv icon

Cross-Lingual Learning in Multilingual Scene Text Recognition

Add code
Dec 17, 2023
Figure 1 for Cross-Lingual Learning in Multilingual Scene Text Recognition
Figure 2 for Cross-Lingual Learning in Multilingual Scene Text Recognition
Figure 3 for Cross-Lingual Learning in Multilingual Scene Text Recognition
Figure 4 for Cross-Lingual Learning in Multilingual Scene Text Recognition
Viaarxiv icon

Semantic-Driven Initial Image Construction for Guided Image Synthesis in Diffusion Model

Add code
Dec 13, 2023
Figure 1 for Semantic-Driven Initial Image Construction for Guided Image Synthesis in Diffusion Model
Figure 2 for Semantic-Driven Initial Image Construction for Guided Image Synthesis in Diffusion Model
Figure 3 for Semantic-Driven Initial Image Construction for Guided Image Synthesis in Diffusion Model
Figure 4 for Semantic-Driven Initial Image Construction for Guided Image Synthesis in Diffusion Model
Viaarxiv icon

Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

Add code
Nov 22, 2023
Figure 1 for Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
Figure 2 for Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
Figure 3 for Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
Figure 4 for Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
Viaarxiv icon