Picture for Lorenzo Baraldi

Lorenzo Baraldi

RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors

Add code
Jun 09, 2025
Viaarxiv icon

What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models

Add code
May 26, 2025
Viaarxiv icon

Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack

Add code
May 21, 2025
Viaarxiv icon

LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning

Add code
Mar 19, 2025
Figure 1 for LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
Figure 2 for LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
Figure 3 for LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
Figure 4 for LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
Viaarxiv icon

Hyperbolic Safety-Aware Vision-Language Models

Add code
Mar 15, 2025
Figure 1 for Hyperbolic Safety-Aware Vision-Language Models
Figure 2 for Hyperbolic Safety-Aware Vision-Language Models
Figure 3 for Hyperbolic Safety-Aware Vision-Language Models
Figure 4 for Hyperbolic Safety-Aware Vision-Language Models
Viaarxiv icon

Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval

Add code
Mar 03, 2025
Viaarxiv icon

Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries

Add code
Dec 26, 2024
Viaarxiv icon

Causal Graphical Models for Vision-Language Compositional Understanding

Add code
Dec 12, 2024
Figure 1 for Causal Graphical Models for Vision-Language Compositional Understanding
Figure 2 for Causal Graphical Models for Vision-Language Compositional Understanding
Figure 3 for Causal Graphical Models for Vision-Language Compositional Understanding
Figure 4 for Causal Graphical Models for Vision-Language Compositional Understanding
Viaarxiv icon

Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis

Add code
Dec 04, 2024
Figure 1 for Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis
Figure 2 for Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis
Figure 3 for Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis
Figure 4 for Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis
Viaarxiv icon

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation

Add code
Nov 28, 2024
Figure 1 for Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
Figure 2 for Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
Figure 3 for Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
Figure 4 for Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
Viaarxiv icon