Picture for Alessio Tonioni

Alessio Tonioni

R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs

Add code
Apr 22, 2026
Viaarxiv icon

SSL-R1: Self-Supervised Visual Reinforcement Post-Training for Multimodal Large Language Models

Add code
Apr 22, 2026
Viaarxiv icon

Shifting the Breaking Point of Flow Matching for Multi-Instance Editing

Add code
Feb 09, 2026
Viaarxiv icon

RefAM: Attention Magnets for Zero-Shot Referral Segmentation

Add code
Sep 26, 2025
Figure 1 for RefAM: Attention Magnets for Zero-Shot Referral Segmentation
Figure 2 for RefAM: Attention Magnets for Zero-Shot Referral Segmentation
Figure 3 for RefAM: Attention Magnets for Zero-Shot Referral Segmentation
Figure 4 for RefAM: Attention Magnets for Zero-Shot Referral Segmentation
Viaarxiv icon

Test-Time Visual In-Context Tuning

Add code
Mar 27, 2025
Figure 1 for Test-Time Visual In-Context Tuning
Figure 2 for Test-Time Visual In-Context Tuning
Figure 3 for Test-Time Visual In-Context Tuning
Figure 4 for Test-Time Visual In-Context Tuning
Viaarxiv icon

Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos

Add code
Mar 17, 2025
Figure 1 for Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos
Figure 2 for Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos
Figure 3 for Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos
Figure 4 for Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos
Viaarxiv icon

UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency

Add code
Dec 19, 2024
Figure 1 for UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency
Figure 2 for UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency
Figure 3 for UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency
Figure 4 for UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency
Viaarxiv icon

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Add code
Nov 27, 2024
Viaarxiv icon

BRAVE: Broadening the visual encoding of vision-language models

Add code
Apr 10, 2024
Viaarxiv icon

Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces

Add code
Mar 29, 2024
Figure 1 for Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
Figure 2 for Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
Figure 3 for Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
Figure 4 for Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
Viaarxiv icon