Alert button
Picture for Chinmay Hegde

Chinmay Hegde

Alert button

On the Fine-Grained Hardness of Inverting Generative Models

Sep 11, 2023
Feyza Duman Keles, Chinmay Hegde

The objective of generative model inversion is to identify a size-$n$ latent vector that produces a generative model output that closely matches a given target. This operation is a core computational primitive in numerous modern applications involving computer vision and NLP. However, the problem is known to be computationally challenging and NP-hard in the worst case. This paper aims to provide a fine-grained view of the landscape of computational hardness for this problem. We establish several new hardness lower bounds for both exact and approximate model inversion. In exact inversion, the goal is to determine whether a target is contained within the range of a given generative model. Under the strong exponential time hypothesis (SETH), we demonstrate that the computational complexity of exact inversion is lower bounded by $\Omega(2^n)$ via a reduction from $k$-SAT; this is a strengthening of known results. For the more practically relevant problem of approximate inversion, the goal is to determine whether a point in the model range is close to a given target with respect to the $\ell_p$-norm. When $p$ is a positive odd integer, under SETH, we provide an $\Omega(2^n)$ complexity lower bound via a reduction from the closest vectors problem (CVP). Finally, when $p$ is even, under the exponential time hypothesis (ETH), we provide a lower bound of $2^{\Omega (n)}$ via a reduction from Half-Clique and Vertex-Cover.

* 19 pages 
Viaarxiv icon

Towards Foundational AI Models for Additive Manufacturing: Language Models for G-Code Debugging, Manipulation, and Comprehension

Sep 04, 2023
Anushrut Jignasu, Kelly Marshall, Baskar Ganapathysubramanian, Aditya Balu, Chinmay Hegde, Adarsh Krishnamurthy

Figure 1 for Towards Foundational AI Models for Additive Manufacturing: Language Models for G-Code Debugging, Manipulation, and Comprehension
Figure 2 for Towards Foundational AI Models for Additive Manufacturing: Language Models for G-Code Debugging, Manipulation, and Comprehension
Figure 3 for Towards Foundational AI Models for Additive Manufacturing: Language Models for G-Code Debugging, Manipulation, and Comprehension
Figure 4 for Towards Foundational AI Models for Additive Manufacturing: Language Models for G-Code Debugging, Manipulation, and Comprehension

3D printing or additive manufacturing is a revolutionary technology that enables the creation of physical objects from digital models. However, the quality and accuracy of 3D printing depend on the correctness and efficiency of the G-code, a low-level numerical control programming language that instructs 3D printers how to move and extrude material. Debugging G-code is a challenging task that requires a syntactic and semantic understanding of the G-code format and the geometry of the part to be printed. In this paper, we present the first extensive evaluation of six state-of-the-art foundational large language models (LLMs) for comprehending and debugging G-code files for 3D printing. We design effective prompts to enable pre-trained LLMs to understand and manipulate G-code and test their performance on various aspects of G-code debugging and manipulation, including detection and correction of common errors and the ability to perform geometric transformations. We analyze their strengths and weaknesses for understanding complete G-code files. We also discuss the implications and limitations of using LLMs for G-code comprehension.

Viaarxiv icon

Distributionally Robust Classification on a Data Budget

Aug 07, 2023
Benjamin Feuer, Ameya Joshi, Minh Pham, Chinmay Hegde

Figure 1 for Distributionally Robust Classification on a Data Budget
Figure 2 for Distributionally Robust Classification on a Data Budget
Figure 3 for Distributionally Robust Classification on a Data Budget
Figure 4 for Distributionally Robust Classification on a Data Budget

Real world uses of deep learning require predictable model behavior under distribution shifts. Models such as CLIP show emergent natural distributional robustness comparable to humans, but may require hundreds of millions of training samples. Can we train robust learners in a domain where data is limited? To rigorously address this question, we introduce JANuS (Joint Annotations and Names Set), a collection of four new training datasets with images, labels, and corresponding captions, and perform a series of carefully controlled investigations of factors contributing to robustness in image classification, then compare those results to findings derived from a large-scale meta-analysis. Using this approach, we show that standard ResNet-50 trained with the cross-entropy loss on 2.4 million image samples can attain comparable robustness to a CLIP ResNet-50 trained on 400 million samples. To our knowledge, this is the first result showing (near) state-of-the-art distributional robustness on limited data budgets. Our dataset is available at \url{https://huggingface.co/datasets/penfever/JANuS_dataset}, and the code used to reproduce our experiments can be found at \url{https://github.com/penfever/vlhub/}.

* TMLR 2023; openreview link: https://openreview.net/forum?id=D5Z2E8CNsD 
Viaarxiv icon

Circumventing Concept Erasure Methods For Text-to-Image Generative Models

Aug 03, 2023
Minh Pham, Kelly O. Marshall, Chinmay Hegde

Figure 1 for Circumventing Concept Erasure Methods For Text-to-Image Generative Models
Figure 2 for Circumventing Concept Erasure Methods For Text-to-Image Generative Models
Figure 3 for Circumventing Concept Erasure Methods For Text-to-Image Generative Models
Figure 4 for Circumventing Concept Erasure Methods For Text-to-Image Generative Models

Text-to-image generative models can produce photo-realistic images for an extremely broad range of concepts, and their usage has proliferated widely among the general public. On the flip side, these models have numerous drawbacks, including their potential to generate images featuring sexually explicit content, mirror artistic styles without permission, or even hallucinate (or deepfake) the likenesses of celebrities. Consequently, various methods have been proposed in order to "erase" sensitive concepts from text-to-image models. In this work, we examine five recently proposed concept erasure methods, and show that targeted concepts are not fully excised from any of these methods. Specifically, we leverage the existence of special learned word embeddings that can retrieve "erased" concepts from the sanitized models with no alterations to their weights. Our results highlight the brittleness of post hoc concept erasure methods, and call into question their use in the algorithmic toolkit for AI safety.

Viaarxiv icon

Identity-Preserving Aging of Face Images via Latent Diffusion Models

Jul 17, 2023
Sudipta Banerjee, Govind Mittal, Ameya Joshi, Chinmay Hegde, Nasir Memon

Figure 1 for Identity-Preserving Aging of Face Images via Latent Diffusion Models
Figure 2 for Identity-Preserving Aging of Face Images via Latent Diffusion Models
Figure 3 for Identity-Preserving Aging of Face Images via Latent Diffusion Models
Figure 4 for Identity-Preserving Aging of Face Images via Latent Diffusion Models

The performance of automated face recognition systems is inevitably impacted by the facial aging process. However, high quality datasets of individuals collected over several years are typically small in scale. In this work, we propose, train, and validate the use of latent text-to-image diffusion models for synthetically aging and de-aging face images. Our models succeed with few-shot training, and have the added benefit of being controllable via intuitive textual prompting. We observe high degrees of visual realism in the generated images while maintaining biometric fidelity measured by commonly used metrics. We evaluate our method on two benchmark datasets (CelebA and AgeDB) and observe significant reduction (~44%) in the False Non-Match Rate compared to existing state-of the-art baselines.

* Accepted to appear in International Joint Conference in Biometrics (IJCB) 2023 
Viaarxiv icon

Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos

Jun 22, 2023
Md Zahid Hasan, Jiajing Chen, Jiyang Wang, Mohammed Shaiqur Rahman, Ameya Joshi, Senem Velipasalar, Chinmay Hegde, Anuj Sharma, Soumik Sarkar

Figure 1 for Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos
Figure 2 for Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos
Figure 3 for Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos
Figure 4 for Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos

Recognizing the activities, causing distraction, in real-world driving scenarios is critical for ensuring the safety and reliability of both drivers and pedestrians on the roadways. Conventional computer vision techniques are typically data-intensive and require a large volume of annotated training data to detect and classify various distracted driving behaviors, thereby limiting their efficiency and scalability. We aim to develop a generalized framework that showcases robust performance with access to limited or no annotated training data. Recently, vision-language models have offered large-scale visual-textual pretraining that can be adapted to task-specific learning like distracted driving activity recognition. Vision-language pretraining models, such as CLIP, have shown significant promise in learning natural language-guided visual representations. This paper proposes a CLIP-based driver activity recognition approach that identifies driver distraction from naturalistic driving images and videos. CLIP's vision embedding offers zero-shot transfer and task-based finetuning, which can classify distracted activities from driving video data. Our results show that this framework offers state-of-the-art performance on zero-shot transfer and video-based CLIP for predicting the driver's state on two public datasets. We propose both frame-based and video-based frameworks developed on top of the CLIP's visual representation for distracted driving detection and classification task and report the results.

* 15 pages, 10 figures 
Viaarxiv icon

ZeroForge: Feedforward Text-to-Shape Without 3D Supervision

Jun 16, 2023
Kelly O. Marshall, Minh Pham, Ameya Joshi, Anushrut Jignasu, Aditya Balu, Adarsh Krishnamurthy, Chinmay Hegde

Figure 1 for ZeroForge: Feedforward Text-to-Shape Without 3D Supervision
Figure 2 for ZeroForge: Feedforward Text-to-Shape Without 3D Supervision
Figure 3 for ZeroForge: Feedforward Text-to-Shape Without 3D Supervision
Figure 4 for ZeroForge: Feedforward Text-to-Shape Without 3D Supervision

Current state-of-the-art methods for text-to-shape generation either require supervised training using a labeled dataset of pre-defined 3D shapes, or perform expensive inference-time optimization of implicit neural representations. In this work, we present ZeroForge, an approach for zero-shot text-to-shape generation that avoids both pitfalls. To achieve open-vocabulary shape generation, we require careful architectural adaptation of existing feed-forward approaches, as well as a combination of data-free CLIP-loss and contrastive losses to avoid mode collapse. Using these techniques, we are able to considerably expand the generative ability of existing feed-forward text-to-shape models such as CLIP-Forge. We support our method via extensive qualitative and quantitative evaluations

* 19 pages, High resolution figures needed to demonstrate 3D results 
Viaarxiv icon

LiT Tuned Models for Efficient Species Detection

Feb 12, 2023
Andre Nakkab, Benjamin Feuer, Chinmay Hegde

Figure 1 for LiT Tuned Models for Efficient Species Detection
Figure 2 for LiT Tuned Models for Efficient Species Detection
Figure 3 for LiT Tuned Models for Efficient Species Detection
Figure 4 for LiT Tuned Models for Efficient Species Detection

Recent advances in training vision-language models have demonstrated unprecedented robustness and transfer learning effectiveness; however, standard computer vision datasets are image-only, and therefore not well adapted to such training methods. Our paper introduces a simple methodology for adapting any fine-grained image classification dataset for distributed vision-language pretraining. We implement this methodology on the challenging iNaturalist-2021 dataset, comprised of approximately 2.7 million images of macro-organisms across 10,000 classes, and achieve a new state-of-the art model in terms of zero-shot classification accuracy. Somewhat surprisingly, our model (trained using a new method called locked-image text tuning) uses a pre-trained, frozen vision representation, proving that language alignment alone can attain strong transfer learning performance, even on fractious, long-tailed datasets. Our approach opens the door for utilizing high quality vision-language pretrained models in agriculturally relevant applications involving species detection.

* 5 pages, 5 figures, 1 table, presented at AAAI 2023 conference for the AIAFS workshop 
Viaarxiv icon

Implicit Regularization for Group Sparsity

Jan 29, 2023
Jiangyuan Li, Thanh V. Nguyen, Chinmay Hegde, Raymond K. W. Wong

Figure 1 for Implicit Regularization for Group Sparsity
Figure 2 for Implicit Regularization for Group Sparsity
Figure 3 for Implicit Regularization for Group Sparsity
Figure 4 for Implicit Regularization for Group Sparsity

We study the implicit regularization of gradient descent towards structured sparsity via a novel neural reparameterization, which we call a diagonally grouped linear neural network. We show the following intriguing property of our reparameterization: gradient descent over the squared regression loss, without any explicit regularization, biases towards solutions with a group sparsity structure. In contrast to many existing works in understanding implicit regularization, we prove that our training trajectory cannot be simulated by mirror descent. We analyze the gradient dynamics of the corresponding regression problem in the general noise setting and obtain minimax-optimal error rates. Compared to existing bounds for implicit sparse regularization using diagonal linear networks, our analysis with the new reparameterization shows improved sample complexity. In the degenerate case of size-one groups, our approach gives rise to a new algorithm for sparse linear regression. Finally, we demonstrate the efficacy of our approach with several numerical experiments.

* accepted by ICLR 2023 
Viaarxiv icon