Picture for Serge Belongie

Serge Belongie

Cornell Tech

Cultural Evaluations of Vision-Language Models Have a Lot to Learn from Cultural Theory

Add code
May 28, 2025
Viaarxiv icon

RAVENEA: A Benchmark for Multimodal Retrieval-Augmented Visual Culture Understanding

Add code
May 20, 2025
Viaarxiv icon

Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation

Add code
Apr 21, 2025
Viaarxiv icon

POEM: Precise Object-level Editing via MLLM control

Add code
Apr 10, 2025
Viaarxiv icon

Taxonomy-Aware Evaluation of Vision-Language Models

Add code
Apr 07, 2025
Viaarxiv icon

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models

Add code
Apr 03, 2025
Viaarxiv icon

Multi-Modal Framing Analysis of News

Add code
Mar 26, 2025
Viaarxiv icon

Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

Add code
Mar 20, 2025
Viaarxiv icon

Gradient Imbalance in Direct Preference Optimization

Add code
Feb 28, 2025
Viaarxiv icon

Bayesian Optimization for Controlled Image Editing via LLMs

Add code
Feb 26, 2025
Figure 1 for Bayesian Optimization for Controlled Image Editing via LLMs
Figure 2 for Bayesian Optimization for Controlled Image Editing via LLMs
Figure 3 for Bayesian Optimization for Controlled Image Editing via LLMs
Figure 4 for Bayesian Optimization for Controlled Image Editing via LLMs
Viaarxiv icon