Picture for Leonid Sigal

Leonid Sigal

MM-R$^3$: On (In-)Consistency of Multi-modal Large Language Models (MLLMs)

Add code
Oct 07, 2024
Viaarxiv icon

Response Wide Shut: Surprising Observations in Basic Vision Language Model Capabilities

Add code
Aug 13, 2024
Viaarxiv icon

On Pre-training of Multimodal Language Models Customized for Chart Understanding

Add code
Jul 19, 2024
Viaarxiv icon

Representing Animatable Avatar via Factorized Neural Fields

Add code
Jun 02, 2024
Viaarxiv icon

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach

Add code
Apr 17, 2024
Viaarxiv icon

Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection

Add code
Mar 21, 2024
Viaarxiv icon

Visual Concept-driven Image Generation with Text-to-Image Diffusion Model

Add code
Feb 18, 2024
Figure 1 for Visual Concept-driven Image Generation with Text-to-Image Diffusion Model
Figure 2 for Visual Concept-driven Image Generation with Text-to-Image Diffusion Model
Figure 3 for Visual Concept-driven Image Generation with Text-to-Image Diffusion Model
Figure 4 for Visual Concept-driven Image Generation with Text-to-Image Diffusion Model
Viaarxiv icon

Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)

Add code
Jan 23, 2024
Viaarxiv icon

Joint Generative Modeling of Scene Graphs and Images via Diffusion Models

Add code
Jan 02, 2024
Viaarxiv icon

Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models

Add code
Dec 19, 2023
Viaarxiv icon