Multimodal Machine Translation


Multimodal machine translation is the task of doing machine translation with multiple data sources—for example, translating a sentence 'a bird is flying over water' along with an image of a bird over water to German text.

Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval

Add code
Sep 30, 2024
Figure 1 for Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval
Figure 2 for Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval
Figure 3 for Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval
Figure 4 for Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval
Viaarxiv icon

Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets

Add code
Apr 09, 2024
Figure 1 for Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets
Figure 2 for Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets
Figure 3 for Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets
Figure 4 for Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets
Viaarxiv icon

3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset

Add code
Apr 29, 2024
Viaarxiv icon

LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages in Multimodal Image Retrieval Task

Add code
Aug 25, 2024
Viaarxiv icon

Detecting Concrete Visual Tokens for Multimodal Machine Translation

Add code
Mar 05, 2024
Figure 1 for Detecting Concrete Visual Tokens for Multimodal Machine Translation
Figure 2 for Detecting Concrete Visual Tokens for Multimodal Machine Translation
Figure 3 for Detecting Concrete Visual Tokens for Multimodal Machine Translation
Figure 4 for Detecting Concrete Visual Tokens for Multimodal Machine Translation
Viaarxiv icon

Task Arithmetic for Language Expansion in Speech Translation

Add code
Sep 17, 2024
Figure 1 for Task Arithmetic for Language Expansion in Speech Translation
Figure 2 for Task Arithmetic for Language Expansion in Speech Translation
Figure 3 for Task Arithmetic for Language Expansion in Speech Translation
Figure 4 for Task Arithmetic for Language Expansion in Speech Translation
Viaarxiv icon

FLEURS-ASL: Including American Sign Language in Massively Multilingual Multitask Evaluation

Add code
Aug 24, 2024
Viaarxiv icon

Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking Across Diverse Vocabularies

Add code
Aug 21, 2024
Figure 1 for Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking Across Diverse Vocabularies
Figure 2 for Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking Across Diverse Vocabularies
Figure 3 for Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking Across Diverse Vocabularies
Figure 4 for Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking Across Diverse Vocabularies
Viaarxiv icon

MultiMax: Sparse and Multi-Modal Attention Learning

Add code
Jun 04, 2024
Viaarxiv icon

Adding Multimodal Capabilities to a Text-only Translation Model

Add code
Mar 05, 2024
Viaarxiv icon