Multimodal Machine Translation


Multimodal machine translation is the task of doing machine translation with multiple data sources—for example, translating a sentence 'a bird is flying over water' along with an image of a bird over water to German text.

Dual-branch Prompting for Multimodal Machine Translation

Add code
Jul 23, 2025
Viaarxiv icon

Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation

Add code
Jul 10, 2025
Viaarxiv icon

MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications

Add code
Jun 24, 2025
Viaarxiv icon

CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation

Add code
May 30, 2025
Viaarxiv icon

Multimodal Machine Translation with Visual Scene Graph Pruning

Add code
May 26, 2025
Viaarxiv icon

Evaluating and Steering Modality Preferences in Multimodal Large Language Model

Add code
May 27, 2025
Viaarxiv icon

CAtCh: Cognitive Assessment through Cookie Thief

Add code
Jun 07, 2025
Viaarxiv icon

TopicVD: A Topic-Based Dataset of Video-Guided Multimodal Machine Translation for Documentaries

Add code
May 09, 2025
Viaarxiv icon

A Culturally-diverse Multilingual Multimodal Video Benchmark & Model

Add code
Jun 08, 2025
Viaarxiv icon

Memory Reviving, Continuing Learning and Beyond: Evaluation of Pre-trained Encoders and Decoders for Multimodal Machine Translation

Add code
Apr 25, 2025
Viaarxiv icon