Picture for Serge Belongie

Serge Belongie

Cornell Tech

Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning

Add code
Jan 28, 2026
Viaarxiv icon

SuperF: Neural Implicit Fields for Multi-Image Super-Resolution

Add code
Dec 09, 2025
Viaarxiv icon

OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

Add code
Dec 08, 2025
Figure 1 for OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory
Figure 2 for OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory
Figure 3 for OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory
Figure 4 for OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory
Viaarxiv icon

Stitch: Training-Free Position Control in Multimodal Diffusion Transformers

Add code
Sep 30, 2025
Viaarxiv icon

RespoDiff: Dual-Module Bottleneck Transformation for Responsible & Faithful T2I Generation

Add code
Sep 18, 2025
Viaarxiv icon

Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy

Add code
Sep 16, 2025
Figure 1 for Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy
Figure 2 for Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy
Figure 3 for Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy
Figure 4 for Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy
Viaarxiv icon

Cultural Evaluations of Vision-Language Models Have a Lot to Learn from Cultural Theory

Add code
May 28, 2025
Viaarxiv icon

RAVENEA: A Benchmark for Multimodal Retrieval-Augmented Visual Culture Understanding

Add code
May 20, 2025
Viaarxiv icon

Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation

Add code
Apr 21, 2025
Figure 1 for Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
Figure 2 for Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
Figure 3 for Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
Figure 4 for Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
Viaarxiv icon

POEM: Precise Object-level Editing via MLLM control

Add code
Apr 10, 2025
Figure 1 for POEM: Precise Object-level Editing via MLLM control
Figure 2 for POEM: Precise Object-level Editing via MLLM control
Figure 3 for POEM: Precise Object-level Editing via MLLM control
Figure 4 for POEM: Precise Object-level Editing via MLLM control
Viaarxiv icon