Multimodal Machine Translation


Multimodal machine translation is the task of doing machine translation with multiple data sources—for example, translating a sentence 'a bird is flying over water' along with an image of a bird over water to German text.

CyIN: Cyclic Informative Latent Space for Bridging Complete and Incomplete Multimodal Learning

Add code
Feb 04, 2026
Viaarxiv icon

MUSE: A Multi-agent Framework for Unconstrained Story Envisioning via Closed-Loop Cognitive Orchestration

Add code
Feb 03, 2026
Viaarxiv icon

Hermes the Polyglot: A Unified Framework to Enhance Expressiveness for Multimodal Interlingual Subtitling

Add code
Jan 31, 2026
Viaarxiv icon

Radiomics in Medical Imaging: Methods, Applications, and Challenges

Add code
Jan 24, 2026
Viaarxiv icon

Multimodal Machine Learning for Soft High-k Elastomers under Data Scarcity

Add code
Jan 25, 2026
Viaarxiv icon

Large-Scale Multidimensional Knowledge Profiling of Scientific Literature

Add code
Jan 21, 2026
Viaarxiv icon

Multilingual-To-Multimodal (M2M): Unlocking New Languages with Monolingual Text

Add code
Jan 15, 2026
Viaarxiv icon

TranslateGemma Technical Report

Add code
Jan 15, 2026
Viaarxiv icon

MultiCaption: Detecting disinformation using multilingual visual claims

Add code
Jan 16, 2026
Viaarxiv icon

MMFormalizer: Multimodal Autoformalization in the Wild

Add code
Jan 06, 2026
Viaarxiv icon