Picture for Marcella Cornia

Marcella Cornia

CounterVid: Counterfactual Video Generation for Mitigating Action and Temporal Hallucinations in Video-Language Models

Add code
Jan 08, 2026
Viaarxiv icon

Seeing Beyond Words: Self-Supervised Visual Learning for Multimodal Large Language Models

Add code
Dec 17, 2025
Viaarxiv icon

Recurrence Meets Transformers for Universal Multimodal Retrieval

Add code
Sep 10, 2025
Viaarxiv icon

RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors

Add code
Jun 09, 2025
Viaarxiv icon

Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals

Add code
May 27, 2025
Figure 1 for Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals
Figure 2 for Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals
Figure 3 for Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals
Figure 4 for Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals
Viaarxiv icon

What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models

Add code
May 26, 2025
Viaarxiv icon

Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack

Add code
May 21, 2025
Viaarxiv icon

Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation

Add code
Apr 18, 2025
Viaarxiv icon

LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning

Add code
Mar 19, 2025
Figure 1 for LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
Figure 2 for LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
Figure 3 for LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
Figure 4 for LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
Viaarxiv icon

Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives

Add code
Mar 18, 2025
Figure 1 for Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
Figure 2 for Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
Figure 3 for Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
Figure 4 for Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
Viaarxiv icon