Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dilip K. Prasad

SAM for Robust Mitochondria Instance Segmentation in Fluorescence Microscopy

May 29, 2026

Suyog Jadhav, Dilip K. Prasad, Krishna Agarwal

Abstract:The morphological analysis of mitochondria in fluorescence microscopy (FM) is crucial for understanding cellular health, energy production, and metabolic regulation. While foundation models like the Segment Anything Model (SAM) have revolutionized natural image segmentation, their direct application to FM is hindered by a significant domain shift characterized by diffraction-limited resolution, low contrast, and complex overlapping organelle networks. Furthermore, the development of robust models is bottlenecked by a severe lack of high-quality, manually annotated instance segmentation datasets for mitochondria. In this paper, we propose a scalable solution to this data scarcity by finetuning SAM exclusively on synthetically generated FM data. We simulate realistic mitochondria data and emulate the optical properties of fluorescence microscopes to create a large-scale annotated dataset. We evaluate our fine-tuned model on a curated dataset of real, manually annotated FM images. Qualitative and quantitative analyses demonstrate that our synthetically fine-tuned model improves precision and average dice score over strong baselines. This work establishes the potential of simulation-assisted training for FM instance segmentation.

* Accepted at PHAROS-AIF-MIH workshop @ CVPR 2026

Via

Access Paper or Ask Questions

packetLSTM: Dynamic LSTM Framework for Streaming Data with Varying Feature Space

Oct 22, 2024

Rohit Agarwal, Karaka Prasanth Naidu, Alexander Horsch, Krishna Agarwal, Dilip K. Prasad

Figure 1 for packetLSTM: Dynamic LSTM Framework for Streaming Data with Varying Feature Space

Figure 2 for packetLSTM: Dynamic LSTM Framework for Streaming Data with Varying Feature Space

Figure 3 for packetLSTM: Dynamic LSTM Framework for Streaming Data with Varying Feature Space

Figure 4 for packetLSTM: Dynamic LSTM Framework for Streaming Data with Varying Feature Space

Abstract:We study the online learning problem characterized by the varying input feature space of streaming data. Although LSTMs have been employed to effectively capture the temporal nature of streaming data, they cannot handle the dimension-varying streams in an online learning setting. Therefore, we propose a dynamic LSTM-based novel method, called packetLSTM, to model the dimension-varying streams. The packetLSTM's dynamic framework consists of an evolving packet of LSTMs, each dedicated to processing one input feature. Each LSTM retains the local information of its corresponding feature, while a shared common memory consolidates global information. This configuration facilitates continuous learning and mitigates the issue of forgetting, even when certain features are absent for extended time periods. The idea of utilizing one LSTM per feature coupled with a dimension-invariant operator for information aggregation enhances the dynamic nature of packetLSTM. This dynamic nature is evidenced by the model's ability to activate, deactivate, and add new LSTMs as required, thus seamlessly accommodating varying input dimensions. The packetLSTM achieves state-of-the-art results on five datasets, and its underlying principle is extended to other RNN types, like GRU and vanilla RNN.

Via

Access Paper or Ask Questions

Hedging Is Not All You Need: A Simple Baseline for Online Learning Under Haphazard Inputs

Sep 16, 2024

Himanshu Buckchash, Momojit Biswas, Rohit Agarwal, Dilip K. Prasad

Figure 1 for Hedging Is Not All You Need: A Simple Baseline for Online Learning Under Haphazard Inputs

Figure 2 for Hedging Is Not All You Need: A Simple Baseline for Online Learning Under Haphazard Inputs

Figure 3 for Hedging Is Not All You Need: A Simple Baseline for Online Learning Under Haphazard Inputs

Figure 4 for Hedging Is Not All You Need: A Simple Baseline for Online Learning Under Haphazard Inputs

Abstract:Handling haphazard streaming data, such as data from edge devices, presents a challenging problem. Over time, the incoming data becomes inconsistent, with missing, faulty, or new inputs reappearing. Therefore, it requires models that are reliable. Recent methods to solve this problem depend on a hedging-based solution and require specialized elements like auxiliary dropouts, forked architectures, and intricate network design. We observed that hedging can be reduced to a special case of weighted residual connection; this motivated us to approximate it with plain self-attention. In this work, we propose HapNet, a simple baseline that is scalable, does not require online backpropagation, and is adaptable to varying input types. All present methods are restricted to scaling with a fixed window; however, we introduce a more complex problem of scaling with a variable window where the data becomes positionally uncorrelated, and cannot be addressed by present methods. We demonstrate that a variant of the proposed approach can work even for this complex scenario. We extensively evaluated the proposed approach on five benchmarks and found competitive performance.

Via

Access Paper or Ask Questions

Towards a More Inclusive AI: Progress and Perspectives in Large Language Model Training for the Sámi Language

May 09, 2024

Ronny Paul, Himanshu Buckchash, Shantipriya Parida, Dilip K. Prasad

Figure 1 for Towards a More Inclusive AI: Progress and Perspectives in Large Language Model Training for the Sámi Language

Figure 2 for Towards a More Inclusive AI: Progress and Perspectives in Large Language Model Training for the Sámi Language

Figure 3 for Towards a More Inclusive AI: Progress and Perspectives in Large Language Model Training for the Sámi Language

Figure 4 for Towards a More Inclusive AI: Progress and Perspectives in Large Language Model Training for the Sámi Language

Abstract:S\'ami, an indigenous language group comprising multiple languages, faces digital marginalization due to the limited availability of data and sophisticated language models designed for its linguistic intricacies. This work focuses on increasing technological participation for the S\'ami language. We draw the attention of the ML community towards the language modeling problem of Ultra Low Resource (ULR) languages. ULR languages are those for which the amount of available textual resources is very low, and the speaker count for them is also very low. ULRLs are also not supported by mainstream Large Language Models (LLMs) like ChatGPT, due to which gathering artificial training data for them becomes even more challenging. Mainstream AI foundational model development has given less attention to this category of languages. Generally, these languages have very few speakers, making it hard to find them. However, it is important to develop foundational models for these ULR languages to promote inclusion and the tangible abilities and impact of LLMs. To this end, we have compiled the available S\'ami language resources from the web to create a clean dataset for training language models. In order to study the behavior of modern LLM models with ULR languages (S\'ami), we have experimented with different kinds of LLMs, mainly at the order of $\sim$ seven billion parameters. We have also explored the effect of multilingual LLM training for ULRLs. We found that the decoder-only models under a sequential multilingual training scenario perform better than joint multilingual training, whereas multilingual training with high semantic overlap, in general, performs better than training from scratch.This is the first study on the S\'ami language for adapting non-statistical language models that use the latest developments in the field of natural language processing (NLP).

Via

Access Paper or Ask Questions

Online Learning under Haphazard Input Conditions: A Comprehensive Review and Analysis

Apr 07, 2024

Rohit Agarwal, Arijit Das, Alexander Horsch, Krishna Agarwal, Dilip K. Prasad

Figure 1 for Online Learning under Haphazard Input Conditions: A Comprehensive Review and Analysis

Figure 2 for Online Learning under Haphazard Input Conditions: A Comprehensive Review and Analysis

Figure 3 for Online Learning under Haphazard Input Conditions: A Comprehensive Review and Analysis

Figure 4 for Online Learning under Haphazard Input Conditions: A Comprehensive Review and Analysis

Abstract:The domain of online learning has experienced multifaceted expansion owing to its prevalence in real-life applications. Nonetheless, this progression operates under the assumption that the input feature space of the streaming data remains constant. In this survey paper, we address the topic of online learning in the context of haphazard inputs, explicitly foregoing such an assumption. We discuss, classify, evaluate, and compare the methodologies that are adept at modeling haphazard inputs, additionally providing the corresponding code implementations and their carbon footprint. Moreover, we classify the datasets related to the field of haphazard inputs and introduce evaluation metrics specifically designed for datasets exhibiting imbalance. The code of each methodology can be found at https://github.com/Rohit102497/HaphazardInputsReview

Via

Access Paper or Ask Questions

Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols

Nov 05, 2023

Iqra Qasim, Alexander Horsch, Dilip K. Prasad

Figure 1 for Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols

Figure 2 for Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols

Figure 3 for Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols

Figure 4 for Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols

Abstract:Untrimmed videos have interrelated events, dependencies, context, overlapping events, object-object interactions, domain specificity, and other semantics that are worth highlighting while describing a video in natural language. Owing to such a vast diversity, a single sentence can only correctly describe a portion of the video. Dense Video Captioning (DVC) aims at detecting and describing different events in a given video. The term DVC originated in the 2017 ActivityNet challenge, after which considerable effort has been made to address the challenge. Dense Video Captioning is divided into three sub-tasks: (1) Video Feature Extraction (VFE), (2) Temporal Event Localization (TEL), and (3) Dense Caption Generation (DCG). This review aims to discuss all the studies that claim to perform DVC along with its sub-tasks and summarize their results. We also discuss all the datasets that have been used for DVC. Lastly, we highlight some emerging challenges and future trends in the field.

* 35 pages, 10 figures

Via

Access Paper or Ask Questions

Modelling Irregularly Sampled Time Series Without Imputation

Sep 15, 2023

Rohit Agarwal, Aman Sinha, Dilip K. Prasad, Marianne Clausel, Alexander Horsch, Mathieu Constant, Xavier Coubez

Figure 1 for Modelling Irregularly Sampled Time Series Without Imputation

Figure 2 for Modelling Irregularly Sampled Time Series Without Imputation

Figure 3 for Modelling Irregularly Sampled Time Series Without Imputation

Figure 4 for Modelling Irregularly Sampled Time Series Without Imputation

Abstract:Modelling irregularly-sampled time series (ISTS) is challenging because of missing values. Most existing methods focus on handling ISTS by converting irregularly sampled data into regularly sampled data via imputation. These models assume an underlying missing mechanism leading to unwanted bias and sub-optimal performance. We present SLAN (Switch LSTM Aggregate Network), which utilizes a pack of LSTMs to model ISTS without imputation, eliminating the assumption of any underlying process. It dynamically adapts its architecture on the fly based on the measured sensors. SLAN exploits the irregularity information to capture each sensor's local summary explicitly and maintains a global summary state throughout the observational period. We demonstrate the efficacy of SLAN on publicly available datasets, namely, MIMIC-III, Physionet 2012 and Physionet 2019. The code is available at https://github.com/Rohit102497/SLAN.

Via

Access Paper or Ask Questions

pNNCLR: Stochastic Pseudo Neighborhoods for Contrastive Learning based Unsupervised Representation Learning Problems

Aug 14, 2023

Momojit Biswas, Himanshu Buckchash, Dilip K. Prasad

Figure 1 for pNNCLR: Stochastic Pseudo Neighborhoods for Contrastive Learning based Unsupervised Representation Learning Problems

Figure 2 for pNNCLR: Stochastic Pseudo Neighborhoods for Contrastive Learning based Unsupervised Representation Learning Problems

Figure 3 for pNNCLR: Stochastic Pseudo Neighborhoods for Contrastive Learning based Unsupervised Representation Learning Problems

Figure 4 for pNNCLR: Stochastic Pseudo Neighborhoods for Contrastive Learning based Unsupervised Representation Learning Problems

Abstract:Nearest neighbor (NN) sampling provides more semantic variations than pre-defined transformations for self-supervised learning (SSL) based image recognition problems. However, its performance is restricted by the quality of the support set, which holds positive samples for the contrastive loss. In this work, we show that the quality of the support set plays a crucial role in any nearest neighbor based method for SSL. We then provide a refined baseline (pNNCLR) to the nearest neighbor based SSL approach (NNCLR). To this end, we introduce pseudo nearest neighbors (pNN) to control the quality of the support set, wherein, rather than sampling the nearest neighbors, we sample in the vicinity of hard nearest neighbors by varying the magnitude of the resultant vector and employing a stochastic sampling strategy to improve the performance. Additionally, to stabilize the effects of uncertainty in NN-based learning, we employ a smooth-weight-update approach for training the proposed network. Evaluation of the proposed method on multiple public image recognition and medical image recognition datasets shows that it performs up to 8 percent better than the baseline nearest neighbor method, and is comparable to other previously proposed SSL methods.

* 15 pages, 5 figures

Via

Access Paper or Ask Questions

Latent Graph Attention for Enhanced Spatial Context

Jul 12, 2023

Ayush Singh, Yash Bhambhu, Himanshu Buckchash, Deepak K. Gupta, Dilip K. Prasad

Figure 1 for Latent Graph Attention for Enhanced Spatial Context

Figure 2 for Latent Graph Attention for Enhanced Spatial Context

Figure 3 for Latent Graph Attention for Enhanced Spatial Context

Figure 4 for Latent Graph Attention for Enhanced Spatial Context

Abstract:Global contexts in images are quite valuable in image-to-image translation problems. Conventional attention-based and graph-based models capture the global context to a large extent, however, these are computationally expensive. Moreover, the existing approaches are limited to only learning the pairwise semantic relation between any two points on the image. In this paper, we present Latent Graph Attention (LGA) a computationally inexpensive (linear to the number of nodes) and stable, modular framework for incorporating the global context in the existing architectures, especially empowering small-scale architectures to give performance closer to large size architectures, thus making the light-weight architectures more useful for edge devices with lower compute power and lower energy needs. LGA propagates information spatially using a network of locally connected graphs, thereby facilitating to construct a semantically coherent relation between any two spatially distant points that also takes into account the influence of the intermediate pixels. Moreover, the depth of the graph network can be used to adapt the extent of contextual spread to the target dataset, thereby being able to explicitly control the added computational cost. To enhance the learning mechanism of LGA, we also introduce a novel contrastive loss term that helps our LGA module to couple well with the original architecture at the expense of minimal additional computational load. We show that incorporating LGA improves the performance on three challenging applications, namely transparent object segmentation, image restoration for dehazing and optical flow estimation.

* 20 pages, 7 figures

Via

Access Paper or Ask Questions

Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective

Mar 10, 2023

Animesh Gupta, Irtiza Hasan, Dilip K. Prasad, Deepak K. Gupta

Figure 1 for Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective

Figure 2 for Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective

Figure 3 for Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective

Figure 4 for Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective

Abstract:Coreset selection is among the most effective ways to reduce the training time of CNNs, however, only limited is known on how the resultant models will behave under variations of the coreset size, and choice of datasets and models. Moreover, given the recent paradigm shift towards transformer-based models, it is still an open question how coreset selection would impact their performance. There are several similar intriguing questions that need to be answered for a wide acceptance of coreset selection methods, and this paper attempts to answer some of these. We present a systematic benchmarking setup and perform a rigorous comparison of different coreset selection methods on CNNs and transformers. Our investigation reveals that under certain circumstances, random selection of subsets is more robust and stable when compared with the SOTA selection methods. We demonstrate that the conventional concept of uniform subset sampling across the various classes of the data is not the appropriate choice. Rather samples should be adaptively chosen based on the complexity of the data distribution for each class. Transformers are generally pretrained on large datasets, and we show that for certain target datasets, it helps to keep their performance stable at even very small coreset sizes. We further show that when no pretraining is done or when the pretrained transformer models are used with non-natural images (e.g. medical data), CNNs tend to generalize better than transformers at even very small coreset sizes. Lastly, we demonstrate that in the absence of the right pretraining, CNNs are better at learning the semantic coherence between spatially distant objects within an image, and these tend to outperform transformers at almost all choices of the coreset size.

Via

Access Paper or Ask Questions