Abstract:Traditional decision-based black-box adversarial attacks on image classifiers aim to generate adversarial examples by slightly modifying input images while keeping the number of queries low, where each query involves sending an input to the model and observing its output. Most existing methods assume that all queries have equal cost. However, in practice, queries may incur asymmetric costs; for example, in content moderation systems, certain output classes may trigger additional review, enforcement, or penalties, making them more costly than others. While prior work has considered such asymmetric cost settings, effective algorithms for this scenario remain underdeveloped. In this paper, we propose a general framework for decision-based attacks under asymmetric query costs, which we refer to as asymmetric black-box attacks. We modify two core components of existing attacks: the search strategy and the gradient estimation process. Specifically, we propose Asymmetric Search (AS), a more conservative variant of binary search that reduces reliance on high-cost queries, and Asymmetric Gradient Estimation (AGREST), which shifts the sampling distribution to favor low-cost queries. We design efficient algorithms that minimize total attack cost by balancing different query types, in contrast to earlier methods such as stealthy attacks that focus only on limiting expensive (high-cost) queries. Our method can be integrated into a range of existing black-box attacks with minimal changes. We perform both theoretical analysis and empirical evaluation on standard image classification benchmarks. Across various cost regimes, our method consistently achieves lower total query cost and smaller perturbations than existing approaches, with improvements of up to 40% in some settings.
Abstract:\textit{Zero-shot} models like CLIP are often fine-tuned on a target dataset to improve its accuracy further, but this can compromise out-of-distribution (OOD) robustness. Robust Fine-Tuning (\texttt{RFT} )~\citep{wortsman2021robust}, which interpolates between the \textit{zero-shot} and \textit{fine-tuned} models, has been proposed to address this issue. However, understanding when \texttt{RFT} actually improves OOD error remains limited. In this work, we empirically investigate the robustness of \texttt{RFT} in CLIP models, with a focus on the \textit{sharpness} of the CLIP model during interpolation. First, we demonstrate that while sharpness may not serve as a reliable indicator for predicting the generalization of modern architectures like CLIP on OOD data, this challenges the conventional belief in the generalization benefits of flat minima in foundation models. However, by examining the role of the \textit{straggler layer} phenomenon, we show that, unlike overall sharpness, the \textit{layer-wise} sharpness of \textit{straggler} layers can reliably capture the generalization performance of interpolated CLIP models on OOD data. Our extensive experiments reveal that \textit{layer-wise} sharpness correlates with generalization in OOD accuracy for \texttt{RFT}. Furthermore, we demonstrate that by inducing sparsity in the \textit{straggler} layers, we can mitigate the \textit{failure mode} phenomenon in \texttt{RFT}. To the best of our knowledge, this is the first work to study the role of sharpness in the \textit{success} of interpolation in the weight space of CLIP foundation models. Our code is available at \url{https://github.com/alirezaabdollahpour/CLIP_Mode_Connectivity}.