Picture for Nicu Sebe

Nicu Sebe

When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding

Add code
Jun 05, 2025
Viaarxiv icon

VidText: Towards Comprehensive Evaluation for Video Text Understanding

Add code
May 28, 2025
Viaarxiv icon

Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals

Add code
May 27, 2025
Viaarxiv icon

What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models

Add code
May 26, 2025
Viaarxiv icon

Self-Supervised and Generalizable Tokenization for CLIP-Based 3D Understanding

Add code
May 24, 2025
Viaarxiv icon

Manifold-aware Representation Learning for Degradation-agnostic Image Restoration

Add code
May 24, 2025
Viaarxiv icon

MLLMs are Deeply Affected by Modality Bias

Add code
May 24, 2025
Viaarxiv icon

FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors

Add code
May 02, 2025
Viaarxiv icon

Cues3D: Unleashing the Power of Sole NeRF for Consistent and Unique Instances in Open-Vocabulary 3D Panoptic Segmentation

Add code
May 01, 2025
Viaarxiv icon

Visual Text Processing: A Comprehensive Review and Unified Evaluation

Add code
Apr 30, 2025
Viaarxiv icon