Multimodal Models


Seeing Through the Chain: Mitigate Hallucination in Multimodal Reasoning Models via CoT Compression and Contrastive Preference Optimization

Add code
Feb 03, 2026
Viaarxiv icon

R1-SyntheticVL: Is Synthetic Data from Generative Models Ready for Multimodal Large Language Model?

Add code
Feb 03, 2026
Viaarxiv icon

Fast-Slow Efficient Training for Multimodal Large Language Models via Visual Token Pruning

Add code
Feb 03, 2026
Viaarxiv icon

Generating a Paracosm for Training-Free Zero-Shot Composed Image Retrieval

Add code
Feb 03, 2026
Viaarxiv icon

ELIQ: A Label-Free Framework for Quality Assessment of Evolving AI-Generated Images

Add code
Feb 03, 2026
Viaarxiv icon

PnP-U3D: Plug-and-Play 3D Framework Bridging Autoregression and Diffusion for Unified Understanding and Generation

Add code
Feb 03, 2026
Viaarxiv icon

Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation

Add code
Feb 03, 2026
Viaarxiv icon

Instruction Anchors: Dissecting the Causal Dynamics of Modality Arbitration

Add code
Feb 03, 2026
Viaarxiv icon

ObjEmbed: Towards Universal Multimodal Object Embeddings

Add code
Feb 03, 2026
Viaarxiv icon

AesRec: A Dataset for Aesthetics-Aligned Clothing Outfit Recommendation

Add code
Feb 03, 2026
Viaarxiv icon