Picture for Serge Belongie

Serge Belongie

Cornell Tech

Stitch: Training-Free Position Control in Multimodal Diffusion Transformers

Add code
Sep 30, 2025
Viaarxiv icon

RespoDiff: Dual-Module Bottleneck Transformation for Responsible & Faithful T2I Generation

Add code
Sep 18, 2025
Viaarxiv icon

Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy

Add code
Sep 16, 2025
Figure 1 for Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy
Figure 2 for Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy
Figure 3 for Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy
Figure 4 for Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy
Viaarxiv icon

Cultural Evaluations of Vision-Language Models Have a Lot to Learn from Cultural Theory

Add code
May 28, 2025
Viaarxiv icon

RAVENEA: A Benchmark for Multimodal Retrieval-Augmented Visual Culture Understanding

Add code
May 20, 2025
Viaarxiv icon

Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation

Add code
Apr 21, 2025
Figure 1 for Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
Figure 2 for Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
Figure 3 for Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
Figure 4 for Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
Viaarxiv icon

POEM: Precise Object-level Editing via MLLM control

Add code
Apr 10, 2025
Figure 1 for POEM: Precise Object-level Editing via MLLM control
Figure 2 for POEM: Precise Object-level Editing via MLLM control
Figure 3 for POEM: Precise Object-level Editing via MLLM control
Figure 4 for POEM: Precise Object-level Editing via MLLM control
Viaarxiv icon

Taxonomy-Aware Evaluation of Vision-Language Models

Add code
Apr 07, 2025
Viaarxiv icon

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models

Add code
Apr 03, 2025
Viaarxiv icon

Multi-Modal Framing Analysis of News

Add code
Mar 26, 2025
Figure 1 for Multi-Modal Framing Analysis of News
Figure 2 for Multi-Modal Framing Analysis of News
Figure 3 for Multi-Modal Framing Analysis of News
Figure 4 for Multi-Modal Framing Analysis of News
Viaarxiv icon