Text


PRIMA: Pre-training with Risk-integrated Image-Metadata Alignment for Medical Diagnosis via LLM

Add code
Feb 26, 2026
Viaarxiv icon

PATRA: Pattern-Aware Alignment and Balanced Reasoning for Time Series Question Answering

Add code
Feb 26, 2026
Viaarxiv icon

Cytoarchitecture in Words: Weakly Supervised Vision-Language Modeling for Human Brain Microscopy

Add code
Feb 26, 2026
Viaarxiv icon

WISER: Wider Search, Deeper Thinking, and Adaptive Fusion for Training-Free Zero-Shot Composed Image Retrieval

Add code
Feb 26, 2026
Viaarxiv icon

Where Vision Becomes Text: Locating the OCR Routing Bottleneck in Vision-Language Models

Add code
Feb 26, 2026
Viaarxiv icon

Same Words, Different Judgments: Modality Effects on Preference Alignment

Add code
Feb 26, 2026
Viaarxiv icon

Where Relevance Emerges: A Layer-Wise Study of Internal Attention for Zero-Shot Re-Ranking

Add code
Feb 26, 2026
Viaarxiv icon

SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables

Add code
Feb 26, 2026
Viaarxiv icon

SPATIALALIGN: Aligning Dynamic Spatial Relationships in Video Generation

Add code
Feb 26, 2026
Viaarxiv icon

ViCLIP-OT: The First Foundation Vision-Language Model for Vietnamese Image-Text Retrieval with Optimal Transport

Add code
Feb 26, 2026
Viaarxiv icon