Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sanket Kachole

Multi-FRuGaL: Multimodal Flexible Redundancy-aware Decomposed Gated Learning for Cancer Diagnosis and Prognosis

Jun 05, 2026

Sanket Kachole, Siddhesh Thakur, Shubham Innani, Sanyukta Adap, Suhang You, Carla Pitarch-Abaigar, Spyridon Bakas

Abstract:Modern medicine relies on heterogeneous data sources spanning radiology, pathology, text reports, and structured clinical information. However, real-world patient data are frequently incomplete, with missing or sparsely acquired modalities, limiting the effectiveness of standard multimodal fusion approaches. To this end, we propose the Multimodal Flexible Redundancy-aware decomposed GAted Learning (Multi-FRuGaL) framework, a decomposition-aware, adaptive gated intermediate-fusion framework that performs modality-level representation learning under missing data. Multi-FRuGaL integrates per-modality encoders with a signal decomposition layer, an input-conditioned gating network, and an information-aware fusion objective to separate redundant from modality-specific complementary signals, selectively upweighting informative modalities and suppressing redundant or noisy inputs, and remaining well-defined even when multiple modalities are absent. We evaluate Multi-FRuGaL on two multimodal head and neck cancer cohorts: the HANCOCK challenge dataset (N = 763) comprising five modalities and two prognostic endpoints (5-year survival and 2-year recurrence), and the HECKTOR challenge dataset (N = 588) comprising three modalities for human papillomavirus (HPV) status classification. Multi-FRuGaL consistently achieves higher mean performance than the evaluated baselines across multiple tasks, improving AUC from 0.601 to 0.8496 for survival, from 0.672 to 0.8102 for recurrence, and achieving 0.975 AUC for HPV prediction on HECKTOR. For survival analysis, it further achieves a concordance index of 0.6814 for overall survival, 0.7421 for recurrence-free survival, and 0.7143 for progression-free survival on HANCOCK, and 0.7203 for recurrence-free survival on HECKTOR. Qualitative analyses further show that Multi-FRuGaL learns discriminative and robust multimodal representations, even under severe missing-modality conditions.

Via

Access Paper or Ask Questions

LLM-Conditioned Synthesis of Pathological Gaits via Structured Gait-Language Representations

Jun 04, 2026

Mritula Chandrasekaran, Sanket Kachole, Jarik Francik, Dimitrios Makris

Abstract:Pathological gait datasets remain scarce due to privacy, recruitment, cost, and movement variability. Our work presents a multimodal LLM-guided framework for pathology-aware 3D gait data synthesis from structured textual descriptions. The proposed method generates fixed-length synthetic skeleton-based gait sequences for pathological gait classification tasks. The framework combines motion tokenisation, pathology-aware language conditioning, LLM-based semantic augmentation, and language-to-gait generation. A key contribution is the proposed pathological tokeniser, which is designed to preserve pathology-specific motion characteristics during discrete representation learning. Experiments suggest that the proposed synthetic sequences improve downstream classification for recurrent classifiers when combined with real data. The best result is obtained using a GRU classifier trained with real and synthetic samples, achieving 92.77\% accuracy under a leave-one-subject-out protocol.

* Accepted at CVPR MOMA Workshop 2026 and selected for spotlight presentation at the workshop

Via

Access Paper or Ask Questions

PGcGAN: Pathological Gait-Conditioned GAN for Human Gait Synthesis

Mar 15, 2026

Mritula Chandrasekaran, Sanket Kachole, Jarek Francik, Dimitrios Makris

Abstract:Pathological gait analysis is constrained by limited and variable clinical datasets, which restrict the modeling of diverse gait impairments. To address this challenge, we propose a Pathological Gait-conditioned Generative Adversarial Network (PGcGAN) that synthesises pathology-specific gait sequences directly from observed 3D pose keypoint trajectories data. The framework incorporates one-hot encoded pathology labels within both the generator and discriminator, enabling controlled synthesis across six gait categories. The generator adopts a conditional autoencoder architecture trained with adversarial and reconstruction objectives to preserve structural and temporal gait characteristics. Experiments on the Pathological Gait Dataset demonstrate strong alignment between real and synthetic sequences through PCA and t-SNE analyses, visual kinematic inspection, and downstream classification tasks. Augmenting real data with synthetic sequences improved pathological gait recognition across GRU, LSTM, and CNN models, indicating that pathology-conditioned gait synthesis can effectively support data augmentation in pathological gait analysis.

Via

Access Paper or Ask Questions

PROFUSEme: PROstate Cancer Biochemical Recurrence Prediction via FUSEd Multi-modal Embeddings

Sep 17, 2025

Suhang You, Carla Pitarch-Abaigar, Sanket Kachole, Sumedh Sonawane, Juhyung Ha, Anish Sudarshan Gada, David Crandall, Rakesh Shiradkar, Spyridon Bakas

Figure 1 for PROFUSEme: PROstate Cancer Biochemical Recurrence Prediction via FUSEd Multi-modal Embeddings

Figure 2 for PROFUSEme: PROstate Cancer Biochemical Recurrence Prediction via FUSEd Multi-modal Embeddings

Abstract:Almost 30% of prostate cancer (PCa) patients undergoing radical prostatectomy (RP) experience biochemical recurrence (BCR), characterized by increased prostate specific antigen (PSA) and associated with increased mortality. Accurate early prediction of BCR, at the time of RP, would contribute to prompt adaptive clinical decision-making and improved patient outcomes. In this work, we propose prostate cancer BCR prediction via fused multi-modal embeddings (PROFUSEme), which learns cross-modal interactions of clinical, radiology, and pathology data, following an intermediate fusion configuration in combination with Cox Proportional Hazard regressors. Quantitative evaluation of our proposed approach reveals superior performance, when compared with late fusion configurations, yielding a mean C-index of 0.861 ($\sigma=0.112$) on the internal 5-fold nested cross-validation framework, and a C-index of 0.7103 on the hold out data of CHIMERA 2025 challenge validation leaderboard.

* 11 pages, 1 figure, method paper for CHIMERA 2025 Challenge

Via

Access Paper or Ask Questions

Asynchronous Bioplausible Neuron for Spiking Neural Networks for Event-Based Vision

Nov 20, 2023

Sanket Kachole, Hussain Sajwani, Fariborz Baghaei Naeini, Dimitrios Makris, Yahya Zweiri

Figure 1 for Asynchronous Bioplausible Neuron for Spiking Neural Networks for Event-Based Vision

Figure 2 for Asynchronous Bioplausible Neuron for Spiking Neural Networks for Event-Based Vision

Figure 3 for Asynchronous Bioplausible Neuron for Spiking Neural Networks for Event-Based Vision

Figure 4 for Asynchronous Bioplausible Neuron for Spiking Neural Networks for Event-Based Vision

Abstract:Spiking Neural Networks (SNNs) offer a biologically inspired approach to computer vision that can lead to more efficient processing of visual data with reduced energy consumption. However, maintaining homeostasis within these networks is challenging, as it requires continuous adjustment of neural responses to preserve equilibrium and optimal processing efficiency amidst diverse and often unpredictable input signals. In response to these challenges, we propose the Asynchronous Bioplausible Neuron (ABN), a dynamic spike firing mechanism to auto-adjust the variations in the input signal. Comprehensive evaluation across various datasets demonstrates ABN's enhanced performance in image classification and segmentation, maintenance of neural equilibrium, and energy efficiency.

* 10 pages

Via

Access Paper or Ask Questions

Asynchronous Events-based Panoptic Segmentation using Graph Mixer Neural Network

May 05, 2023

Sanket Kachole, Yusra Alkendi, Fariborz Baghaei Naeini, Dimitrios Makris, Yahya Zweiri

Figure 1 for Asynchronous Events-based Panoptic Segmentation using Graph Mixer Neural Network

Figure 2 for Asynchronous Events-based Panoptic Segmentation using Graph Mixer Neural Network

Figure 3 for Asynchronous Events-based Panoptic Segmentation using Graph Mixer Neural Network

Figure 4 for Asynchronous Events-based Panoptic Segmentation using Graph Mixer Neural Network

Abstract:In the context of robotic grasping, object segmentation encounters several difficulties when faced with dynamic conditions such as real-time operation, occlusion, low lighting, motion blur, and object size variability. In response to these challenges, we propose the Graph Mixer Neural Network that includes a novel collaborative contextual mixing layer, applied to 3D event graphs formed on asynchronous events. The proposed layer is designed to spread spatiotemporal correlation within an event graph at four nearest neighbor levels parallelly. We evaluate the effectiveness of our proposed method on the Event-based Segmentation (ESD) Dataset, which includes five unique image degradation challenges, including occlusion, blur, brightness, trajectory, scale variance, and segmentation of known and unknown objects. The results show that our proposed approach outperforms state-of-the-art methods in terms of mean intersection over the union and pixel accuracy. Code available at: https://github.com/sanket0707/GNN-Mixer.git

* 9 pages, 6 figures

Via

Access Paper or Ask Questions

Bimodal SegNet: Instance Segmentation Fusing Events and RGB Frames for Robotic Grasping

Mar 20, 2023

Sanket Kachole, Xiaoqian Huang, Fariborz Baghaei Naeini, Rajkumar Muthusamy, Dimitrios Makris, Yahya Zweiri

Figure 1 for Bimodal SegNet: Instance Segmentation Fusing Events and RGB Frames for Robotic Grasping

Figure 2 for Bimodal SegNet: Instance Segmentation Fusing Events and RGB Frames for Robotic Grasping

Figure 3 for Bimodal SegNet: Instance Segmentation Fusing Events and RGB Frames for Robotic Grasping

Figure 4 for Bimodal SegNet: Instance Segmentation Fusing Events and RGB Frames for Robotic Grasping

Abstract:Object segmentation for robotic grasping under dynamic conditions often faces challenges such as occlusion, low light conditions, motion blur and object size variance. To address these challenges, we propose a Deep Learning network that fuses two types of visual signals, event-based data and RGB frame data. The proposed Bimodal SegNet network has two distinct encoders, one for each signal input and a spatial pyramidal pooling with atrous convolutions. Encoders capture rich contextual information by pooling the concatenated features at different resolutions while the decoder obtains sharp object boundaries. The evaluation of the proposed method undertakes five unique image degradation challenges including occlusion, blur, brightness, trajectory and scale variance on the Event-based Segmentation (ESD) Dataset. The evaluation results show a 6-10\% segmentation accuracy improvement over state-of-the-art methods in terms of mean intersection over the union and pixel accuracy. The model code is available at https://github.com/sanket0707/Bimodal-SegNet.git

* 8 Pages

Via

Access Paper or Ask Questions