Multimodal Machine Translation


Multimodal machine translation is the task of doing machine translation with multiple data sources—for example, translating a sentence 'a bird is flying over water' along with an image of a bird over water to German text.

A Culturally-diverse Multilingual Multimodal Video Benchmark & Model

Add code
Jun 08, 2025
Viaarxiv icon

CAtCh: Cognitive Assessment through Cookie Thief

Add code
Jun 07, 2025
Viaarxiv icon

CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation

Add code
May 30, 2025
Viaarxiv icon

Multimodal Machine Translation with Visual Scene Graph Pruning

Add code
May 26, 2025
Viaarxiv icon

Evaluating and Steering Modality Preferences in Multimodal Large Language Model

Add code
May 27, 2025
Viaarxiv icon

MT$^{3}$: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning

Add code
May 26, 2025
Viaarxiv icon

TopicVD: A Topic-Based Dataset of Video-Guided Multimodal Machine Translation for Documentaries

Add code
May 09, 2025
Viaarxiv icon

Aya Vision: Advancing the Frontier of Multilingual Multimodality

Add code
May 13, 2025
Viaarxiv icon

Memory Reviving, Continuing Learning and Beyond: Evaluation of Pre-trained Encoders and Decoders for Multimodal Machine Translation

Add code
Apr 25, 2025
Viaarxiv icon

AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers

Add code
Apr 28, 2025
Viaarxiv icon