Multimodal Machine Translation


Multimodal machine translation is the task of doing machine translation with multiple data sources—for example, translating a sentence 'a bird is flying over water' along with an image of a bird over water to German text.

MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications

Add code
Jun 24, 2025
Viaarxiv icon

Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models

Add code
Mar 12, 2025
Figure 1 for Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models
Figure 2 for Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models
Figure 3 for Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models
Figure 4 for Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models
Viaarxiv icon

AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers

Add code
Apr 28, 2025
Figure 1 for AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers
Figure 2 for AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers
Figure 3 for AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers
Figure 4 for AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers
Viaarxiv icon

A Culturally-diverse Multilingual Multimodal Video Benchmark & Model

Add code
Jun 08, 2025
Viaarxiv icon

MT$^{3}$: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning

Add code
May 26, 2025
Figure 1 for MT$^{3}$: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning
Figure 2 for MT$^{3}$: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning
Figure 3 for MT$^{3}$: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning
Figure 4 for MT$^{3}$: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning
Viaarxiv icon

A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling

Add code
Apr 19, 2025
Figure 1 for A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling
Figure 2 for A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling
Figure 3 for A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling
Figure 4 for A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling
Viaarxiv icon

Generative Machine Learning in Adaptive Control of Dynamic Manufacturing Processes: A Review

Add code
Apr 30, 2025
Figure 1 for Generative Machine Learning in Adaptive Control of Dynamic Manufacturing Processes: A Review
Figure 2 for Generative Machine Learning in Adaptive Control of Dynamic Manufacturing Processes: A Review
Viaarxiv icon

Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation

Add code
Dec 17, 2024
Figure 1 for Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation
Figure 2 for Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation
Figure 3 for Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation
Figure 4 for Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation
Viaarxiv icon

Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions

Add code
Mar 20, 2025
Viaarxiv icon

Learning to Match Unpaired Data with Minimum Entropy Coupling

Add code
Mar 11, 2025
Viaarxiv icon