Multimodal Machine Translation


Multimodal machine translation is the task of doing machine translation with multiple data sources—for example, translating a sentence 'a bird is flying over water' along with an image of a bird over water to German text.

A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling

Add code
Apr 19, 2025
Viaarxiv icon

Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models

Add code
Mar 12, 2025
Viaarxiv icon

Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions

Add code
Mar 20, 2025
Viaarxiv icon

New Trends for Modern Machine Translation with Large Reasoning Models

Add code
Mar 13, 2025
Viaarxiv icon

Learning to Match Unpaired Data with Minimum Entropy Coupling

Add code
Mar 11, 2025
Viaarxiv icon

Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation

Add code
Dec 17, 2024
Viaarxiv icon

Property Enhanced Instruction Tuning for Multi-task Molecule Generation with Large Language Models

Add code
Dec 24, 2024
Viaarxiv icon

Ensuring Consistency for In-Image Translation

Add code
Dec 24, 2024
Viaarxiv icon

Context-Informed Machine Translation of Manga using Multimodal Large Language Models

Add code
Nov 04, 2024
Figure 1 for Context-Informed Machine Translation of Manga using Multimodal Large Language Models
Figure 2 for Context-Informed Machine Translation of Manga using Multimodal Large Language Models
Figure 3 for Context-Informed Machine Translation of Manga using Multimodal Large Language Models
Figure 4 for Context-Informed Machine Translation of Manga using Multimodal Large Language Models
Viaarxiv icon

Multimodal Whole Slide Foundation Model for Pathology

Add code
Nov 29, 2024
Figure 1 for Multimodal Whole Slide Foundation Model for Pathology
Figure 2 for Multimodal Whole Slide Foundation Model for Pathology
Figure 3 for Multimodal Whole Slide Foundation Model for Pathology
Figure 4 for Multimodal Whole Slide Foundation Model for Pathology
Viaarxiv icon