Image Text Matching


Aligning Forest and Trees in Images and Long Captions for Visually Grounded Understanding

Add code
Feb 03, 2026
Viaarxiv icon

Know Your Step: Faster and Better Alignment for Flow Matching Models via Step-aware Advantages

Add code
Feb 02, 2026
Viaarxiv icon

Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

Add code
Feb 03, 2026
Viaarxiv icon

Hierarchical Concept-to-Appearance Guidance for Multi-Subject Image Generation

Add code
Feb 03, 2026
Viaarxiv icon

Test-Time Conditioning with Representation-Aligned Visual Features

Add code
Feb 03, 2026
Viaarxiv icon

ObjEmbed: Towards Universal Multimodal Object Embeddings

Add code
Feb 03, 2026
Viaarxiv icon

Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models

Add code
Feb 02, 2026
Viaarxiv icon

Generating a Paracosm for Training-Free Zero-Shot Composed Image Retrieval

Add code
Feb 03, 2026
Viaarxiv icon

Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis

Add code
Feb 03, 2026
Viaarxiv icon

Differential Vector Erasure: Unified Training-Free Concept Erasure for Flow Matching Models

Add code
Feb 01, 2026
Viaarxiv icon