Alert button
Picture for Soumik Sarkar

Soumik Sarkar

Alert button

Latent Diffusion Models for Structural Component Design

Sep 24, 2023
Ethan Herron, Jaydeep Rade, Anushrut Jignasu, Baskar Ganapathysubramanian, Aditya Balu, Soumik Sarkar, Adarsh Krishnamurthy

Figure 1 for Latent Diffusion Models for Structural Component Design
Figure 2 for Latent Diffusion Models for Structural Component Design
Figure 3 for Latent Diffusion Models for Structural Component Design
Figure 4 for Latent Diffusion Models for Structural Component Design

Recent advances in generative modeling, namely Diffusion models, have revolutionized generative modeling, enabling high-quality image generation tailored to user needs. This paper proposes a framework for the generative design of structural components. Specifically, we employ a Latent Diffusion model to generate potential designs of a component that can satisfy a set of problem-specific loading conditions. One of the distinct advantages our approach offers over other generative approaches, such as generative adversarial networks (GANs), is that it permits the editing of existing designs. We train our model using a dataset of geometries obtained from structural topology optimization utilizing the SIMP algorithm. Consequently, our framework generates inherently near-optimal designs. Our work presents quantitative results that support the structural performance of the generated designs and the variability in potential candidate designs. Furthermore, we provide evidence of the scalability of our framework by operating over voxel domains with resolutions varying from $32^3$ to $128^3$. Our framework can be used as a starting point for generating novel near-optimal designs similar to topology-optimized designs.

Viaarxiv icon

Active shooter detection and robust tracking utilizing supplemental synthetic data

Sep 06, 2023
Joshua R. Waite, Jiale Feng, Riley Tavassoli, Laura Harris, Sin Yong Tan, Subhadeep Chakraborty, Soumik Sarkar

The increasing concern surrounding gun violence in the United States has led to a focus on developing systems to improve public safety. One approach to developing such a system is to detect and track shooters, which would help prevent or mitigate the impact of violent incidents. In this paper, we proposed detecting shooters as a whole, rather than just guns, which would allow for improved tracking robustness, as obscuring the gun would no longer cause the system to lose sight of the threat. However, publicly available data on shooters is much more limited and challenging to create than a gun dataset alone. Therefore, we explore the use of domain randomization and transfer learning to improve the effectiveness of training with synthetic data obtained from Unreal Engine environments. This enables the model to be trained on a wider range of data, increasing its ability to generalize to different situations. Using these techniques with YOLOv8 and Deep OC-SORT, we implemented an initial version of a shooter tracking system capable of running on edge hardware, including both a Raspberry Pi and a Jetson Nano.

* 11 pages, 6 figures 
Viaarxiv icon

Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos

Jun 22, 2023
Md Zahid Hasan, Jiajing Chen, Jiyang Wang, Mohammed Shaiqur Rahman, Ameya Joshi, Senem Velipasalar, Chinmay Hegde, Anuj Sharma, Soumik Sarkar

Figure 1 for Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos
Figure 2 for Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos
Figure 3 for Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos
Figure 4 for Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos

Recognizing the activities, causing distraction, in real-world driving scenarios is critical for ensuring the safety and reliability of both drivers and pedestrians on the roadways. Conventional computer vision techniques are typically data-intensive and require a large volume of annotated training data to detect and classify various distracted driving behaviors, thereby limiting their efficiency and scalability. We aim to develop a generalized framework that showcases robust performance with access to limited or no annotated training data. Recently, vision-language models have offered large-scale visual-textual pretraining that can be adapted to task-specific learning like distracted driving activity recognition. Vision-language pretraining models, such as CLIP, have shown significant promise in learning natural language-guided visual representations. This paper proposes a CLIP-based driver activity recognition approach that identifies driver distraction from naturalistic driving images and videos. CLIP's vision embedding offers zero-shot transfer and task-based finetuning, which can classify distracted activities from driving video data. Our results show that this framework offers state-of-the-art performance on zero-shot transfer and video-based CLIP for predicting the driver's state on two public datasets. We propose both frame-based and video-based frameworks developed on top of the CLIP's visual representation for distracted driving detection and classification task and report the results.

* 15 pages, 10 figures 
Viaarxiv icon

Deep learning powered real-time identification of insects using citizen science data

Jun 04, 2023
Shivani Chiranjeevi, Mojdeh Sadaati, Zi K Deng, Jayanth Koushik, Talukder Z Jubery, Daren Mueller, Matthew E O Neal, Nirav Merchant, Aarti Singh, Asheesh K Singh, Soumik Sarkar, Arti Singh, Baskar Ganapathysubramanian

Figure 1 for Deep learning powered real-time identification of insects using citizen science data
Figure 2 for Deep learning powered real-time identification of insects using citizen science data
Figure 3 for Deep learning powered real-time identification of insects using citizen science data
Figure 4 for Deep learning powered real-time identification of insects using citizen science data

Insect-pests significantly impact global agricultural productivity and quality. Effective management involves identifying the full insect community, including beneficial insects and harmful pests, to develop and implement integrated pest management strategies. Automated identification of insects under real-world conditions presents several challenges, including differentiating similar-looking species, intra-species dissimilarity and inter-species similarity, several life cycle stages, camouflage, diverse imaging conditions, and variability in insect orientation. A deep-learning model, InsectNet, is proposed to address these challenges. InsectNet is endowed with five key features: (a) utilization of a large dataset of insect images collected through citizen science; (b) label-free self-supervised learning for large models; (c) improving prediction accuracy for species with a small sample size; (d) enhancing model trustworthiness; and (e) democratizing access through streamlined MLOps. This approach allows accurate identification (>96% accuracy) of over 2500 insect species, including pollinator (e.g., butterflies, bees), parasitoid (e.g., some wasps and flies), predator species (e.g., lady beetles, mantises, dragonflies) and harmful pest species (e.g., armyworms, cutworms, grasshoppers, stink bugs). InsectNet can identify invasive species, provide fine-grained insect species identification, and work effectively in challenging backgrounds. It also can abstain from making predictions when uncertain, facilitating seamless human intervention and making it a practical and trustworthy tool. InsectNet can guide citizen science data collection, especially for invasive species where early detection is crucial. Similar approaches may transform other agricultural challenges like disease detection and underscore the importance of data collection, particularly through citizen science efforts..

Viaarxiv icon

Out-of-distribution detection algorithms for robust insect classification

May 02, 2023
Mojdeh Saadati, Aditya Balu, Shivani Chiranjeevi, Talukder Zaki Jubery, Asheesh K Singh, Soumik Sarkar, Arti Singh, Baskar Ganapathysubramanian

Figure 1 for Out-of-distribution detection algorithms for robust insect classification
Figure 2 for Out-of-distribution detection algorithms for robust insect classification
Figure 3 for Out-of-distribution detection algorithms for robust insect classification
Figure 4 for Out-of-distribution detection algorithms for robust insect classification

Deep learning-based approaches have produced models with good insect classification accuracy; Most of these models are conducive for application in controlled environmental conditions. One of the primary emphasis of researchers is to implement identification and classification models in the real agriculture fields, which is challenging because input images that are wildly out of the distribution (e.g., images like vehicles, animals, humans, or a blurred image of an insect or insect class that is not yet trained on) can produce an incorrect insect classification. Out-of-distribution (OOD) detection algorithms provide an exciting avenue to overcome these challenge as it ensures that a model abstains from making incorrect classification prediction of non-insect and/or untrained insect class images. We generate and evaluate the performance of state-of-the-art OOD algorithms on insect detection classifiers. These algorithms represent a diversity of methods for addressing an OOD problem. Specifically, we focus on extrusive algorithms, i.e., algorithms that wrap around a well-trained classifier without the need for additional co-training. We compared three OOD detection algorithms: (i) Maximum Softmax Probability, which uses the softmax value as a confidence score, (ii) Mahalanobis distance-based algorithm, which uses a generative classification approach; and (iii) Energy-Based algorithm that maps the input data to a scalar value, called energy. We performed an extensive series of evaluations of these OOD algorithms across three performance axes: (a) \textit{Base model accuracy}: How does the accuracy of the classifier impact OOD performance? (b) How does the \textit{level of dissimilarity to the domain} impact OOD performance? and (c) \textit{Data imbalance}: How sensitive is OOD performance to the imbalance in per-class sample size?

Viaarxiv icon

SpecXAI -- Spectral interpretability of Deep Learning Models

Feb 20, 2023
Stefan Druc, Peter Wooldridge, Adarsh Krishnamurthy, Soumik Sarkar, Aditya Balu

Figure 1 for SpecXAI -- Spectral interpretability of Deep Learning Models
Figure 2 for SpecXAI -- Spectral interpretability of Deep Learning Models
Figure 3 for SpecXAI -- Spectral interpretability of Deep Learning Models
Figure 4 for SpecXAI -- Spectral interpretability of Deep Learning Models

Deep learning is becoming increasingly adopted in business and industry due to its ability to transform large quantities of data into high-performing models. These models, however, are generally regarded as black boxes, which, in spite of their performance, could prevent their use. In this context, the field of eXplainable AI attempts to develop techniques that temper the impenetrable nature of the models and promote a level of understanding of their behavior. Here we present our contribution to XAI methods in the form of a framework that we term SpecXAI, which is based on the spectral characterization of the entire network. We show how this framework can be used to not only understand the network but also manipulate it into a linear interpretable symbolic representation.

Viaarxiv icon

3D Reconstruction of Protein Complex Structures Using Synthesized Multi-View AFM Images

Nov 26, 2022
Jaydeep Rade, Soumik Sarkar, Anwesha Sarkar, Adarsh Krishnamurthy

Figure 1 for 3D Reconstruction of Protein Complex Structures Using Synthesized Multi-View AFM Images
Figure 2 for 3D Reconstruction of Protein Complex Structures Using Synthesized Multi-View AFM Images
Figure 3 for 3D Reconstruction of Protein Complex Structures Using Synthesized Multi-View AFM Images
Figure 4 for 3D Reconstruction of Protein Complex Structures Using Synthesized Multi-View AFM Images

Recent developments in deep learning-based methods demonstrated its potential to predict the 3D protein structures using inputs such as protein sequences, Cryo-Electron microscopy (Cryo-EM) images of proteins, etc. However, these methods struggle to predict the protein complexes (PC), structures with more than one protein. In this work, we explore the atomic force microscope (AFM) assisted deep learning-based methods to predict the 3D structure of PCs. The images produced by AFM capture the protein structure in different and random orientations. These multi-view images can help train the neural network to predict the 3D structure of protein complexes. However, obtaining the dataset of actual AFM images is time-consuming and not a pragmatic task. We propose a virtual AFM imaging pipeline that takes a 'PDB' protein file and generates multi-view 2D virtual AFM images using volume rendering techniques. With this, we created a dataset of around 8K proteins. We train a neural network for 3D reconstruction called Pix2Vox++ using the synthesized multi-view 2D AFM images dataset. We compare the predicted structure obtained using a different number of views and get the intersection over union (IoU) value of 0.92 on the training dataset and 0.52 on the validation dataset. We believe this approach will lead to better prediction of the structure of protein complexes.

* 5 apges, 8 figures, Machine Learning for Structural Biology Workshop, NeurIPS 2022 
Viaarxiv icon

Neural PDE Solvers for Irregular Domains

Nov 07, 2022
Biswajit Khara, Ethan Herron, Zhanhong Jiang, Aditya Balu, Chih-Hsuan Yang, Kumar Saurabh, Anushrut Jignasu, Soumik Sarkar, Chinmay Hegde, Adarsh Krishnamurthy, Baskar Ganapathysubramanian

Figure 1 for Neural PDE Solvers for Irregular Domains
Figure 2 for Neural PDE Solvers for Irregular Domains
Figure 3 for Neural PDE Solvers for Irregular Domains
Figure 4 for Neural PDE Solvers for Irregular Domains

Neural network-based approaches for solving partial differential equations (PDEs) have recently received special attention. However, the large majority of neural PDE solvers only apply to rectilinear domains, and do not systematically address the imposition of Dirichlet/Neumann boundary conditions over irregular domain boundaries. In this paper, we present a framework to neurally solve partial differential equations over domains with irregularly shaped (non-rectilinear) geometric boundaries. Our network takes in the shape of the domain as an input (represented using an unstructured point cloud, or any other parametric representation such as Non-Uniform Rational B-Splines) and is able to generalize to novel (unseen) irregular domains; the key technical ingredient to realizing this model is a novel approach for identifying the interior and exterior of the computational grid in a differentiable manner. We also perform a careful error analysis which reveals theoretical insights into several sources of error incurred in the model-building process. Finally, we showcase a wide variety of applications, along with favorable comparisons with ground truth solutions.

Viaarxiv icon