Vietnamese Visual Question Answering


AutoViVQA: A Large-Scale Automatically Constructed Dataset for Vietnamese Visual Question Answering

Add code
Mar 11, 2026
Viaarxiv icon

ViInfographicVQA: A Benchmark for Single and Multi-image Visual Question Answering on Vietnamese Infographics

Add code
Dec 13, 2025
Figure 1 for ViInfographicVQA: A Benchmark for Single and Multi-image Visual Question Answering on Vietnamese Infographics
Figure 2 for ViInfographicVQA: A Benchmark for Single and Multi-image Visual Question Answering on Vietnamese Infographics
Figure 3 for ViInfographicVQA: A Benchmark for Single and Multi-image Visual Question Answering on Vietnamese Infographics
Figure 4 for ViInfographicVQA: A Benchmark for Single and Multi-image Visual Question Answering on Vietnamese Infographics
Viaarxiv icon

Towards Signboard-Oriented Visual Question Answering: ViSignVQA Dataset, Method and Benchmark

Add code
Dec 22, 2025
Viaarxiv icon

VietMEAgent: Culturally-Aware Few-Shot Multimodal Explanation for Vietnamese Visual Question Answering

Add code
Nov 12, 2025
Figure 1 for VietMEAgent: Culturally-Aware Few-Shot Multimodal Explanation for Vietnamese Visual Question Answering
Figure 2 for VietMEAgent: Culturally-Aware Few-Shot Multimodal Explanation for Vietnamese Visual Question Answering
Figure 3 for VietMEAgent: Culturally-Aware Few-Shot Multimodal Explanation for Vietnamese Visual Question Answering
Figure 4 for VietMEAgent: Culturally-Aware Few-Shot Multimodal Explanation for Vietnamese Visual Question Answering
Viaarxiv icon

LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts

Add code
Feb 26, 2025
Figure 1 for LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
Figure 2 for LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
Figure 3 for LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
Figure 4 for LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
Viaarxiv icon

Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations

Add code
Mar 05, 2025
Viaarxiv icon

ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question Answering

Add code
Oct 24, 2024
Figure 1 for ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question Answering
Figure 2 for ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question Answering
Figure 3 for ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question Answering
Figure 4 for ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question Answering
Viaarxiv icon

Advancing Vietnamese Visual Question Answering with Transformer and Convolutional Integration

Add code
Jul 30, 2024
Figure 1 for Advancing Vietnamese Visual Question Answering with Transformer and Convolutional Integration
Figure 2 for Advancing Vietnamese Visual Question Answering with Transformer and Convolutional Integration
Figure 3 for Advancing Vietnamese Visual Question Answering with Transformer and Convolutional Integration
Figure 4 for Advancing Vietnamese Visual Question Answering with Transformer and Convolutional Integration
Viaarxiv icon

ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images

Add code
Apr 29, 2024
Figure 1 for ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images
Figure 2 for ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images
Figure 3 for ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images
Figure 4 for ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images
Viaarxiv icon

Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese

Add code
Aug 22, 2024
Figure 1 for Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese
Figure 2 for Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese
Figure 3 for Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese
Figure 4 for Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese
Viaarxiv icon