Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ganesh Krishnasamy

KnowDiffuser: A Knowledge-Guided Diffusion Planner with LM Reasoning and Prior-Informed Trajectory Initialization

Mar 11, 2026

Fan Ding, Xuewen Luo, Fengze Yang, Bo Yu, HwaHui Tew, Ganesh Krishnasamy, Junn Yong Loo

Abstract:Recent advancements in Language Models (LMs) have demonstrated strong semantic reasoning capabilities, enabling their application in high-level decision-making for autonomous driving (AD). However, LMs operate over discrete token spaces and lack the ability to generate continuous, physically feasible trajectories required for motion planning. Meanwhile, diffusion models have proven effective at generating reliable and dynamically consistent trajectories, but often lack semantic interpretability and alignment with scene-level understanding. To address these limitations, we propose \textbf{KnowDiffuser}, a knowledge-guided motion planning framework that tightly integrates the semantic understanding of language models with the generative power of diffusion models. The framework employs a language model to infer context-aware meta-actions from structured scene representations, which are then mapped to prior trajectories that anchor the subsequent denoising process. A two-stage truncated denoising mechanism refines these trajectories efficiently, preserving both semantic alignment and physical feasibility. Experiments on the nuPlan benchmark demonstrate that KnowDiffuser significantly outperforms existing planners in both open-loop and closed-loop evaluations, establishing a robust and interpretable framework that effectively bridges the semantic-to-physical gap in AD systems.

* 10pages, 1 figure

Via

Access Paper or Ask Questions

RAPiD: Real-time Deterministic Trajectory Planning via Diffusion Behavior Priors for Safe and Efficient Autonomous Driving

Feb 07, 2026

Ruturaj Reddy, Hrishav Bakul Barua, Junn Yong Loo, Thanh Thi Nguyen, Ganesh Krishnasamy

Abstract:Diffusion-based trajectory planners have demonstrated strong capability for modeling the multimodal nature of human driving behavior, but their reliance on iterative stochastic sampling poses critical challenges for real-time, safety-critical deployment. In this work, we present RAPiD, a deterministic policy extraction framework that distills a pretrained diffusion-based planner into an efficient policy while eliminating diffusion sampling. Using score-regularized policy optimization, we leverage the score function of a pre-trained diffusion planner as a behavior prior to regularize policy learning. To promote safety and passenger comfort, the policy is optimized using a critic trained to imitate a predictive driver controller, providing dense, safety-focused supervision beyond conventional imitation learning. Evaluations demonstrate that RAPiD achieves competitive performance on closed-loop nuPlan scenarios with an 8x speedup over diffusion baselines, while achieving state-of-the-art generalization among learning-based planners on the interPlan benchmark. The official website of this work is: https://github.com/ruturajreddy/RAPiD.

Via

Access Paper or Ask Questions

Seeing Roads Through Words: A Language-Guided Framework for RGB-T Driving Scene Segmentation

Feb 07, 2026

Ruturaj Reddy, Hrishav Bakul Barua, Junn Yong Loo, Thanh Thi Nguyen, Ganesh Krishnasamy

Abstract:Robust semantic segmentation of road scenes under adverse illumination, lighting, and shadow conditions remain a core challenge for autonomous driving applications. RGB-Thermal fusion is a standard approach, yet existing methods apply static fusion strategies uniformly across all conditions, allowing modality-specific noise to propagate throughout the network. Hence, we propose CLARITY that dynamically adapts its fusion strategy to the detected scene condition. Guided by vision-language model (VLM) priors, the network learns to modulate each modality's contribution based on the illumination state while leveraging object embeddings for segmentation, rather than applying a fixed fusion policy. We further introduce two mechanisms, i.e., one which preserves valid dark-object semantics that prior noise-suppression methods incorrectly discard, and a hierarchical decoder that enforces structural consistency across scales to sharpen boundaries on thin objects. Experiments on the MFNet dataset demonstrate that CLARITY establishes a new state-of-the-art (SOTA), achieving 62.3% mIoU and 77.5% mAcc.

Via

Access Paper or Ask Questions

MD-BERT: Action Recognition in Dark Videos via Dynamic Multi-Stream Fusion and Temporal Modeling

Feb 06, 2025

Sharana Dharshikgan Suresh Dass, Hrishav Bakul Barua, Ganesh Krishnasamy, Raveendran Paramesran, Raphael C. -W. Phan

Figure 1 for MD-BERT: Action Recognition in Dark Videos via Dynamic Multi-Stream Fusion and Temporal Modeling

Figure 2 for MD-BERT: Action Recognition in Dark Videos via Dynamic Multi-Stream Fusion and Temporal Modeling

Figure 3 for MD-BERT: Action Recognition in Dark Videos via Dynamic Multi-Stream Fusion and Temporal Modeling

Figure 4 for MD-BERT: Action Recognition in Dark Videos via Dynamic Multi-Stream Fusion and Temporal Modeling

Abstract:Action recognition in dark, low-light (under-exposed) or noisy videos is a challenging task due to visibility degradation, which can hinder critical spatiotemporal details. This paper proposes MD-BERT, a novel multi-stream approach that integrates complementary pre-processing techniques such as gamma correction and histogram equalization alongside raw dark frames to address these challenges. We introduce the Dynamic Feature Fusion (DFF) module, extending existing attentional fusion methods to a three-stream setting, thereby capturing fine-grained and global contextual information across different brightness and contrast enhancements. The fused spatiotemporal features are then processed by a BERT-based temporal model, which leverages its bidirectional self-attention to effectively capture long-range dependencies and contextual relationships across frames. Extensive experiments on the ARID V1.0 and ARID V1.5 dark video datasets show that MD-BERT outperforms existing methods, establishing a new state-of-the-art performance. Ablation studies further highlight the individual contributions of each input stream and the effectiveness of the proposed DFF and BERT modules. The official website of this work is available at: https://github.com/HrishavBakulBarua/DarkBERT

Via

Access Paper or Ask Questions

ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos

Apr 09, 2024

Sharana Dharshikgan Suresh Dass, Hrishav Bakul Barua, Ganesh Krishnasamy, Raveendran Paramesran, Raphael C. -W. Phan

Figure 1 for ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos

Figure 2 for ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos

Figure 3 for ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos

Figure 4 for ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos

Abstract:Human action or activity recognition in videos is a fundamental task in computer vision with applications in surveillance and monitoring, self-driving cars, sports analytics, human-robot interaction and many more. Traditional supervised methods require large annotated datasets for training, which are expensive and time-consuming to acquire. This work proposes a novel approach using Cross-Architecture Pseudo-Labeling with contrastive learning for semi-supervised action recognition. Our framework leverages both labeled and unlabelled data to robustly learn action representations in videos, combining pseudo-labeling with contrastive learning for effective learning from both types of samples. We introduce a novel cross-architecture approach where 3D Convolutional Neural Networks (3D CNNs) and video transformers (VIT) are utilised to capture different aspects of action representations; hence we call it ActNetFormer. The 3D CNNs excel at capturing spatial features and local dependencies in the temporal domain, while VIT excels at capturing long-range dependencies across frames. By integrating these complementary architectures within the ActNetFormer framework, our approach can effectively capture both local and global contextual information of an action. This comprehensive representation learning enables the model to achieve better performance in semi-supervised action recognition tasks by leveraging the strengths of each of these architectures. Experimental results on standard action recognition datasets demonstrate that our approach performs better than the existing methods, achieving state-of-the-art performance with only a fraction of labeled data. The official website of this work is available at: https://github.com/rana2149/ActNetFormer.

* Submitted for peer review

Via

Access Paper or Ask Questions

GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction

Mar 26, 2024

Hrishav Bakul Barua, Kalin Stefanov, KokSheik Wong, Abhinav Dhall, Ganesh Krishnasamy

Figure 1 for GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction

Figure 2 for GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction

Figure 3 for GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction

Figure 4 for GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction

Abstract:High Dynamic Range (HDR) content (i.e., images and videos) has a broad range of applications. However, capturing HDR content from real-world scenes is expensive and time-consuming. Therefore, the challenging task of reconstructing visually accurate HDR images from their Low Dynamic Range (LDR) counterparts is gaining attention in the vision research community. A major challenge in this research problem is the lack of datasets, which capture diverse scene conditions (e.g., lighting, shadows, weather, locations, landscapes, objects, humans, buildings) and various image features (e.g., color, contrast, saturation, hue, luminance, brightness, radiance). To address this gap, in this paper, we introduce GTA-HDR, a large-scale synthetic dataset of photo-realistic HDR images sampled from the GTA-V video game. We perform thorough evaluation of the proposed dataset, which demonstrates significant qualitative and quantitative improvements of the state-of-the-art HDR image reconstruction methods. Furthermore, we demonstrate the effectiveness of the proposed dataset and its impact on additional computer vision tasks including 3D human pose estimation, human body part segmentation, and holistic scene segmentation. The dataset, data collection pipeline, and evaluation code are available at: https://github.com/HrishavBakulBarua/GTA-HDR.

* Submitted to IEEE

Via

Access Paper or Ask Questions

HistoHDR-Net: Histogram Equalization for Single LDR to HDR Image Translation

Feb 08, 2024

Hrishav Bakul Barua, Ganesh Krishnasamy, KokSheik Wong, Abhinav Dhall, Kalin Stefanov

Figure 1 for HistoHDR-Net: Histogram Equalization for Single LDR to HDR Image Translation

Figure 2 for HistoHDR-Net: Histogram Equalization for Single LDR to HDR Image Translation

Figure 3 for HistoHDR-Net: Histogram Equalization for Single LDR to HDR Image Translation

Figure 4 for HistoHDR-Net: Histogram Equalization for Single LDR to HDR Image Translation

Abstract:High Dynamic Range (HDR) imaging aims to replicate the high visual quality and clarity of real-world scenes. Due to the high costs associated with HDR imaging, the literature offers various data-driven methods for HDR image reconstruction from Low Dynamic Range (LDR) counterparts. A common limitation of these approaches is missing details in regions of the reconstructed HDR images, which are over- or under-exposed in the input LDR images. To this end, we propose a simple and effective method, HistoHDR-Net, to recover the fine details (e.g., color, contrast, saturation, and brightness) of HDR images via a fusion-based approach utilizing histogram-equalized LDR images along with self-attention guidance. Our experiments demonstrate the efficacy of the proposed approach over the state-of-art methods.

* Submitted to IEEE

Via

Access Paper or Ask Questions

ArtHDR-Net: Perceptually Realistic and Accurate HDR Content Creation

Sep 07, 2023

Hrishav Bakul Barua, Ganesh Krishnasamy, KokSheik Wong, Kalin Stefanov, Abhinav Dhall

Figure 1 for ArtHDR-Net: Perceptually Realistic and Accurate HDR Content Creation

Figure 2 for ArtHDR-Net: Perceptually Realistic and Accurate HDR Content Creation

Figure 3 for ArtHDR-Net: Perceptually Realistic and Accurate HDR Content Creation

Figure 4 for ArtHDR-Net: Perceptually Realistic and Accurate HDR Content Creation

Abstract:High Dynamic Range (HDR) content creation has become an important topic for modern media and entertainment sectors, gaming and Augmented/Virtual Reality industries. Many methods have been proposed to recreate the HDR counterparts of input Low Dynamic Range (LDR) images/videos given a single exposure or multi-exposure LDRs. The state-of-the-art methods focus primarily on the preservation of the reconstruction's structural similarity and the pixel-wise accuracy. However, these conventional approaches do not emphasize preserving the artistic intent of the images in terms of human visual perception, which is an essential element in media, entertainment and gaming. In this paper, we attempt to study and fill this gap. We propose an architecture called ArtHDR-Net based on a Convolutional Neural Network that uses multi-exposed LDR features as input. Experimental results show that ArtHDR-Net can achieve state-of-the-art performance in terms of the HDR-VDP-2 score (i.e., mean opinion score index) while reaching competitive performance in terms of PSNR and SSIM.

* Accepted in Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Taipei, Taiwan

Via

Access Paper or Ask Questions

LAB: A Leader-Advocate-Believer Based Optimization Algorithm

Apr 23, 2022

Ruturaj Reddy, Anand J Kulkarni, Ganesh Krishnasamy, Apoorva S Shastri, Amir H. Gandomi

Figure 1 for LAB: A Leader-Advocate-Believer Based Optimization Algorithm

Figure 2 for LAB: A Leader-Advocate-Believer Based Optimization Algorithm

Figure 3 for LAB: A Leader-Advocate-Believer Based Optimization Algorithm

Figure 4 for LAB: A Leader-Advocate-Believer Based Optimization Algorithm

Abstract:This manuscript introduces a new socio-inspired metaheuristic technique referred to as Leader-Advocate-Believer based optimization algorithm (LAB) for engineering and global optimization problems. The proposed algorithm is inspired by the AI-based competitive behaviour exhibited by the individuals in a group while simultaneously improving themselves and establishing a role (Leader, Advocate, Believer). LAB performance in computational time and function evaluations are benchmarked using other metaheuristic algorithms. Besides benchmark problems, the LAB algorithm was applied for solving challenging engineering problems, including abrasive water jet machining, electric discharge machining, micro-machining processes, and process parameter optimization for turning titanium alloy in a minimum quantity lubrication environment. The results were superior to the other algorithms compared such as Firefly Algorithm, Variations of Co-hort Intelligence, Genetic Algorithm, Simulated Annealing, Particle Swarm Optimisation, and Multi-Cohort Intelligence. The results from this study highlighted that the LAB outperforms the other algorithms in terms of function evaluations and computational time. The prominent features of the LAB algorithm along with its limitations are also discussed.

* NIL

Via

Access Paper or Ask Questions