Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andong Hua

Banana100: Breaking NR-IQA Metrics by 100 Iterative Image Replications with Nano Banana Pro

Apr 03, 2026

Kenan Tang, Praveen Arunshankar, Andong Hua, Anthony Yang, Yao Qin

Abstract:The multi-step, iterative image editing capabilities of multi-modal agentic systems have transformed digital content creation. Although latest image editing models faithfully follow instructions and generate high-quality images in single-turn edits, we identify a critical weakness in multi-turn editing, which is the iterative degradation of image quality. As images are repeatedly edited, minor artifacts accumulate, rapidly leading to a severe accumulation of visible noise and a failure to follow simple editing instructions. To systematically study these failures, we introduce Banana100, a comprehensive dataset of 28,000 degraded images generated through 100 iterative editing steps, including diverse textures and image content. Alarmingly, image quality evaluators fail to detect the degradation. Among 21 popular no-reference image quality assessment (NR-IQA) metrics, none of them consistently assign lower scores to heavily degraded images than to clean ones. The dual failures of generators and evaluators may threaten the stability of future model training and the safety of deployed agentic systems, if the low-quality synthetic data generated by multi-turn edits escape quality filters. We release the full code and data to facilitate the development of more robust models, helping to mitigate the fragility of multi-modal agentic systems.

* Accepted to CVPR 2026 Workshop on Agentic AI for Visual Media

Via

Access Paper or Ask Questions

Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack

Feb 27, 2025

Chenhe Gu, Jindong Gu, Andong Hua, Yao Qin

Figure 1 for Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack

Figure 2 for Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack

Figure 3 for Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack

Figure 4 for Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack

Abstract:Multimodal Large Language Models (MLLMs), built upon LLMs, have recently gained attention for their capabilities in image recognition and understanding. However, while MLLMs are vulnerable to adversarial attacks, the transferability of these attacks across different models remains limited, especially under targeted attack setting. Existing methods primarily focus on vision-specific perturbations but struggle with the complex nature of vision-language modality alignment. In this work, we introduce the Dynamic Vision-Language Alignment (DynVLA) Attack, a novel approach that injects dynamic perturbations into the vision-language connector to enhance generalization across diverse vision-language alignment of different models. Our experimental results show that DynVLA significantly improves the transferability of adversarial examples across various MLLMs, including BLIP2, InstructBLIP, MiniGPT4, LLaVA, and closed-source models such as Gemini.

* arXiv admin note: text overlap with arXiv:2403.09766

Via

Access Paper or Ask Questions

Initialization Matters for Adversarial Transfer Learning

Dec 10, 2023

Andong Hua, Jindong Gu, Zhiyu Xue, Nicholas Carlini, Eric Wong, Yao Qin

Figure 1 for Initialization Matters for Adversarial Transfer Learning

Figure 2 for Initialization Matters for Adversarial Transfer Learning

Figure 3 for Initialization Matters for Adversarial Transfer Learning

Figure 4 for Initialization Matters for Adversarial Transfer Learning

Abstract:With the prevalence of the Pretraining-Finetuning paradigm in transfer learning, the robustness of downstream tasks has become a critical concern. In this work, we delve into adversarial robustness in transfer learning and reveal the critical role of initialization, including both the pretrained model and the linear head. First, we discover the necessity of an adversarially robust pretrained model. Specifically, we reveal that with a standard pretrained model, Parameter-Efficient Finetuning~(PEFT) methods either fail to be adversarially robust or continue to exhibit significantly degraded adversarial robustness on downstream tasks, even with adversarial training during finetuning. Leveraging a robust pretrained model, surprisingly, we observe that a simple linear probing can outperform full finetuning and other PEFT methods with random initialization on certain datasets. We further identify that linear probing excels in preserving robustness from the robust pretraining. Based on this, we propose Robust Linear Initialization~(RoLI) for adversarial finetuning, which initializes the linear head with the weights obtained by adversarial linear probing to maximally inherit the robustness from pretraining. Across five different image classification datasets, we demonstrate the effectiveness of RoLI and achieve new state-of-the-art results.

Via

Access Paper or Ask Questions