Multimodal Models


Revisiting Multimodal Positional Encoding in Vision-Language Models

Add code
Oct 27, 2025
Viaarxiv icon

Positional Preservation Embedding for Multimodal Large Language Models

Add code
Oct 27, 2025
Viaarxiv icon

LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation

Add code
Oct 27, 2025
Viaarxiv icon

Survey of Multimodal Geospatial Foundation Models: Techniques, Applications, and Challenges

Add code
Oct 27, 2025
Viaarxiv icon

Hazard-Responsive Digital Twin for Climate-Driven Urban Resilience and Equity

Add code
Oct 27, 2025
Viaarxiv icon

ColorEcosystem: Powering Personalized, Standardized, and Trustworthy Agentic Service in massive-agent Ecosystem

Add code
Oct 27, 2025
Viaarxiv icon

Adapting Speech Foundation Models with Large Language Models for Unified Speech Recognition

Add code
Oct 27, 2025
Viaarxiv icon

OFFSIDE: Benchmarking Unlearning Misinformation in Multimodal Large Language Models

Add code
Oct 26, 2025
Viaarxiv icon

Open Multimodal Retrieval-Augmented Factual Image Generation

Add code
Oct 26, 2025
Viaarxiv icon

MAP4TS: A Multi-Aspect Prompting Framework for Time-Series Forecasting with Large Language Models

Add code
Oct 27, 2025
Viaarxiv icon