Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

R. Venkatesh Babu

ProFeAT: Projected Feature Adversarial Training for Self-Supervised Learning of Robust Representations

Jun 09, 2024

Sravanti Addepalli, Priyam Dey, R. Venkatesh Babu

Figure 1 for ProFeAT: Projected Feature Adversarial Training for Self-Supervised Learning of Robust Representations

Figure 2 for ProFeAT: Projected Feature Adversarial Training for Self-Supervised Learning of Robust Representations

Figure 3 for ProFeAT: Projected Feature Adversarial Training for Self-Supervised Learning of Robust Representations

Figure 4 for ProFeAT: Projected Feature Adversarial Training for Self-Supervised Learning of Robust Representations

Abstract:The need for abundant labelled data in supervised Adversarial Training (AT) has prompted the use of Self-Supervised Learning (SSL) techniques with AT. However, the direct application of existing SSL methods to adversarial training has been sub-optimal due to the increased training complexity of combining SSL with AT. A recent approach, DeACL, mitigates this by utilizing supervision from a standard SSL teacher in a distillation setting, to mimic supervised AT. However, we find that there is still a large performance gap when compared to supervised adversarial training, specifically on larger models. In this work, investigate the key reason for this gap and propose Projected Feature Adversarial Training (ProFeAT) to bridge the same. We show that the sub-optimal distillation performance is a result of mismatch in training objectives of the teacher and student, and propose to use a projection head at the student, that allows it to leverage weak supervision from the teacher while also being able to learn adversarially robust representations that are distinct from the teacher. We further propose appropriate attack and defense losses at the feature and projector, alongside a combination of weak and strong augmentations for the teacher and student respectively, to improve the training data diversity without increasing the training complexity. Through extensive experiments on several benchmark datasets and models, we demonstrate significant improvements in both clean and robust accuracy when compared to existing SSL-AT methods, setting a new state-of-the-art. We further report on-par/ improved performance when compared to TRADES, a popular supervised-AT method.

Via

Access Paper or Ask Questions

DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets

Apr 03, 2024

Harsh Rangwani, Pradipto Mondal, Mayank Mishra, Ashish Ramayee Asokan, R. Venkatesh Babu

Figure 1 for DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets

Figure 2 for DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets

Figure 3 for DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets

Figure 4 for DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets

Abstract:Vision Transformer (ViT) has emerged as a prominent architecture for various computer vision tasks. In ViT, we divide the input image into patch tokens and process them through a stack of self attention blocks. However, unlike Convolutional Neural Networks (CNN), ViTs simple architecture has no informative inductive bias (e.g., locality,etc. ). Due to this, ViT requires a large amount of data for pre-training. Various data efficient approaches (DeiT) have been proposed to train ViT on balanced datasets effectively. However, limited literature discusses the use of ViT for datasets with long-tailed imbalances. In this work, we introduce DeiT-LT to tackle the problem of training ViTs from scratch on long-tailed datasets. In DeiT-LT, we introduce an efficient and effective way of distillation from CNN via distillation DIST token by using out-of-distribution images and re-weighting the distillation loss to enhance focus on tail classes. This leads to the learning of local CNN-like features in early ViT blocks, improving generalization for tail classes. Further, to mitigate overfitting, we propose distilling from a flat CNN teacher, which leads to learning low-rank generalizable features for DIST tokens across all ViT blocks. With the proposed DeiT-LT scheme, the distillation DIST token becomes an expert on the tail classes, and the classifier CLS token becomes an expert on the head classes. The experts help to effectively learn features corresponding to both the majority and minority classes using a distinct set of tokens within the same ViT architecture. We show the effectiveness of DeiT-LT for training ViT from scratch on datasets ranging from small-scale CIFAR-10 LT to large-scale iNaturalist-2018.

* CVPR 2024. Project Page: https://rangwani-harsh.github.io/DeiT-LT

Via

Access Paper or Ask Questions

Balancing Act: Distribution-Guided Debiasing in Diffusion Models

Feb 28, 2024

Rishubh Parihar, Abhijnya Bhat, Saswat Mallick, Abhipsa Basu, Jogendra Nath Kundu, R. Venkatesh Babu

Abstract:Diffusion Models (DMs) have emerged as powerful generative models with unprecedented image generation capability. These models are widely used for data augmentation and creative applications. However, DMs reflect the biases present in the training datasets. This is especially concerning in the context of faces, where the DM prefers one demographic subgroup vs others (eg. female vs male). In this work, we present a method for debiasing DMs without relying on additional data or model retraining. Specifically, we propose Distribution Guidance, which enforces the generated images to follow the prescribed attribute distribution. To realize this, we build on the key insight that the latent features of denoising UNet hold rich demographic semantics, and the same can be leveraged to guide debiased generation. We train Attribute Distribution Predictor (ADP) - a small mlp that maps the latent features to the distribution of attributes. ADP is trained with pseudo labels generated from existing attribute classifiers. The proposed Distribution Guidance with ADP enables us to do fair generation. Our method reduces bias across single/multiple attributes and outperforms the baseline by a significant margin for unconditional and text-conditional diffusion models. Further, we present a downstream task of training a fair attribute classifier by rebalancing the training set with our generated data.

* CVPR 2024. Project Page : https://ab-34.github.io/balancing_act/

Via

Access Paper or Ask Questions

Exploring Attribute Variations in Style-based GANs using Diffusion Models

Nov 27, 2023

Rishubh Parihar, Prasanna Balaji, Raghav Magazine, Sarthak Vora, Tejan Karmali, Varun Jampani, R. Venkatesh Babu

Abstract:Existing attribute editing methods treat semantic attributes as binary, resulting in a single edit per attribute. However, attributes such as eyeglasses, smiles, or hairstyles exhibit a vast range of diversity. In this work, we formulate the task of \textit{diverse attribute editing} by modeling the multidimensional nature of attribute edits. This enables users to generate multiple plausible edits per attribute. We capitalize on disentangled latent spaces of pretrained GANs and train a Denoising Diffusion Probabilistic Model (DDPM) to learn the latent distribution for diverse edits. Specifically, we train DDPM over a dataset of edit latent directions obtained by embedding image pairs with a single attribute change. This leads to latent subspaces that enable diverse attribute editing. Applying diffusion in the highly compressed latent space allows us to model rich distributions of edits within limited computational resources. Through extensive qualitative and quantitative experiments conducted across a range of datasets, we demonstrate the effectiveness of our approach for diverse attribute editing. We also showcase the results of our method applied for 3D editing of various face attributes.

* Neurips Workshop on Diffusion Models 2023

Via

Access Paper or Ask Questions

Distilling from Vision-Language Models for Improved OOD Generalization in Vision Tasks

Oct 12, 2023

Sravanti Addepalli, Ashish Ramayee Asokan, Lakshay Sharma, R. Venkatesh Babu

Abstract:Vision-Language Models (VLMs) such as CLIP are trained on large amounts of image-text pairs, resulting in remarkable generalization across several data distributions. The prohibitively expensive training and data collection/curation costs of these models make them valuable Intellectual Property (IP) for organizations. This motivates a vendor-client paradigm, where a vendor trains a large-scale VLM and grants only input-output access to clients on a pay-per-query basis in a black-box setting. The client aims to minimize inference cost by distilling the VLM to a student model using the limited available task-specific data, and further deploying this student model in the downstream application. While naive distillation largely improves the In-Domain (ID) accuracy of the student, it fails to transfer the superior out-of-distribution (OOD) generalization of the VLM teacher using the limited available labeled images. To mitigate this, we propose Vision-Language to Vision-Align, Distill, Predict (VL2V-ADiP), which first aligns the vision and language modalities of the teacher model with the vision modality of a pre-trained student model, and further distills the aligned VLM embeddings to the student. This maximally retains the pre-trained features of the student, while also incorporating the rich representations of the VLM image encoder and the superior generalization of the text embeddings. The proposed approach achieves state-of-the-art results on the standard Domain Generalization benchmarks in a black-box teacher setting, and also when weights of the VLM are accessible.

* Code is available at https://github.com/val-iisc/VL2V-ADiP.git

Via

Access Paper or Ask Questions

Domain-Specificity Inducing Transformers for Source-Free Domain Adaptation

Aug 27, 2023

Sunandini Sanyal, Ashish Ramayee Asokan, Suvaansh Bhambri, Akshay Kulkarni, Jogendra Nath Kundu, R. Venkatesh Babu

Figure 1 for Domain-Specificity Inducing Transformers for Source-Free Domain Adaptation

Figure 2 for Domain-Specificity Inducing Transformers for Source-Free Domain Adaptation

Figure 3 for Domain-Specificity Inducing Transformers for Source-Free Domain Adaptation

Figure 4 for Domain-Specificity Inducing Transformers for Source-Free Domain Adaptation

Abstract:Conventional Domain Adaptation (DA) methods aim to learn domain-invariant feature representations to improve the target adaptation performance. However, we motivate that domain-specificity is equally important since in-domain trained models hold crucial domain-specific properties that are beneficial for adaptation. Hence, we propose to build a framework that supports disentanglement and learning of domain-specific factors and task-specific factors in a unified model. Motivated by the success of vision transformers in several multi-modal vision problems, we find that queries could be leveraged to extract the domain-specific factors. Hence, we propose a novel Domain-specificity-inducing Transformer (DSiT) framework for disentangling and learning both domain-specific and task-specific factors. To achieve disentanglement, we propose to construct novel Domain-Representative Inputs (DRI) with domain-specific information to train a domain classifier with a novel domain token. We are the first to utilize vision transformers for domain adaptation in a privacy-oriented source-free setting, and our approach achieves state-of-the-art performance on single-source, multi-source, and multi-target benchmarks

* ICCV 2023. Project page: http://val.cds.iisc.ac.in/DSiT-SFDA

Via

Access Paper or Ask Questions

Boosting Adversarial Robustness using Feature Level Stochastic Smoothing

Jun 10, 2023

Sravanti Addepalli, Samyak Jain, Gaurang Sriramanan, R. Venkatesh Babu

Figure 1 for Boosting Adversarial Robustness using Feature Level Stochastic Smoothing

Figure 2 for Boosting Adversarial Robustness using Feature Level Stochastic Smoothing

Figure 3 for Boosting Adversarial Robustness using Feature Level Stochastic Smoothing

Abstract:Advances in adversarial defenses have led to a significant improvement in the robustness of Deep Neural Networks. However, the robust accuracy of present state-ofthe-art defenses is far from the requirements in critical applications such as robotics and autonomous navigation systems. Further, in practical use cases, network prediction alone might not suffice, and assignment of a confidence value for the prediction can prove crucial. In this work, we propose a generic method for introducing stochasticity in the network predictions, and utilize this for smoothing decision boundaries and rejecting low confidence predictions, thereby boosting the robustness on accepted samples. The proposed Feature Level Stochastic Smoothing based classification also results in a boost in robustness without rejection over existing adversarial training methods. Finally, we combine the proposed method with adversarial detection methods, to achieve the benefits of both approaches.

* CVPR Workshops 2021. First three authors contributed equally

Via

Access Paper or Ask Questions

We never go out of Style: Motion Disentanglement by Subspace Decomposition of Latent Space

Jun 01, 2023

Rishubh Parihar, Raghav Magazine, Piyush Tiwari, R. Venkatesh Babu

Figure 1 for We never go out of Style: Motion Disentanglement by Subspace Decomposition of Latent Space

Figure 2 for We never go out of Style: Motion Disentanglement by Subspace Decomposition of Latent Space

Figure 3 for We never go out of Style: Motion Disentanglement by Subspace Decomposition of Latent Space

Figure 4 for We never go out of Style: Motion Disentanglement by Subspace Decomposition of Latent Space

Abstract:Real-world objects perform complex motions that involve multiple independent motion components. For example, while talking, a person continuously changes their expressions, head, and body pose. In this work, we propose a novel method to decompose motion in videos by using a pretrained image GAN model. We discover disentangled motion subspaces in the latent space of widely used style-based GAN models that are semantically meaningful and control a single explainable motion component. The proposed method uses only a few $(\approx10)$ ground truth video sequences to obtain such subspaces. We extensively evaluate the disentanglement properties of motion subspaces on face and car datasets, quantitatively and qualitatively. Further, we present results for multiple downstream tasks such as motion editing, and selective motion transfer, e.g. transferring only facial expressions without training for it.

* AI for content creation, CVPRW-2023

Via

Access Paper or Ask Questions

Inspecting the Geographical Representativeness of Images from Text-to-Image Models

May 18, 2023

Abhipsa Basu, R. Venkatesh Babu, Danish Pruthi

Figure 1 for Inspecting the Geographical Representativeness of Images from Text-to-Image Models

Figure 2 for Inspecting the Geographical Representativeness of Images from Text-to-Image Models

Figure 3 for Inspecting the Geographical Representativeness of Images from Text-to-Image Models

Figure 4 for Inspecting the Geographical Representativeness of Images from Text-to-Image Models

Abstract:Recent progress in generative models has resulted in models that produce both realistic as well as relevant images for most textual inputs. These models are being used to generate millions of images everyday, and hold the potential to drastically impact areas such as generative art, digital marketing and data augmentation. Given their outsized impact, it is important to ensure that the generated content reflects the artifacts and surroundings across the globe, rather than over-representing certain parts of the world. In this paper, we measure the geographical representativeness of common nouns (e.g., a house) generated through DALL.E 2 and Stable Diffusion models using a crowdsourced study comprising 540 participants across 27 countries. For deliberately underspecified inputs without country names, the generated images most reflect the surroundings of the United States followed by India, and the top generations rarely reflect surroundings from all other countries (average score less than 3 out of 5). Specifying the country names in the input increases the representativeness by 1.44 points on average for DALL.E 2 and 0.75 for Stable Diffusion, however, the overall scores for many countries still remain low, highlighting the need for future models to be more geographically inclusive. Lastly, we examine the feasibility of quantifying the geographical representativeness of generated images without conducting user studies.

* Preprint, 15 pages, 9 figures

Via

Access Paper or Ask Questions

Certified Adversarial Robustness Within Multiple Perturbation Bounds

Apr 20, 2023

Soumalya Nandi, Sravanti Addepalli, Harsh Rangwani, R. Venkatesh Babu

Figure 1 for Certified Adversarial Robustness Within Multiple Perturbation Bounds

Figure 2 for Certified Adversarial Robustness Within Multiple Perturbation Bounds

Figure 3 for Certified Adversarial Robustness Within Multiple Perturbation Bounds

Figure 4 for Certified Adversarial Robustness Within Multiple Perturbation Bounds

Abstract:Randomized smoothing (RS) is a well known certified defense against adversarial attacks, which creates a smoothed classifier by predicting the most likely class under random noise perturbations of inputs during inference. While initial work focused on robustness to $\ell_2$ norm perturbations using noise sampled from a Gaussian distribution, subsequent works have shown that different noise distributions can result in robustness to other $\ell_p$ norm bounds as well. In general, a specific noise distribution is optimal for defending against a given $\ell_p$ norm based attack. In this work, we aim to improve the certified adversarial robustness against multiple perturbation bounds simultaneously. Towards this, we firstly present a novel \textit{certification scheme}, that effectively combines the certificates obtained using different noise distributions to obtain optimal results against multiple perturbation bounds. We further propose a novel \textit{training noise distribution} along with a \textit{regularized training scheme} to improve the certification within both $\ell_1$ and $\ell_2$ perturbation norms simultaneously. Contrary to prior works, we compare the certified robustness of different training algorithms across the same natural (clean) accuracy, rather than across fixed noise levels used for training and certification. We also empirically invalidate the argument that training and certifying the classifier with the same amount of noise gives the best results. The proposed approach achieves improvements on the ACR (Average Certified Radius) metric across both $\ell_1$ and $\ell_2$ perturbation bounds.

Via

Access Paper or Ask Questions