Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gautam Siddharth Kashyap

We Think, Therefore We Align LLMs to Helpful, Harmless and Honest Before They Go Wrong

Sep 26, 2025

Gautam Siddharth Kashyap, Mark Dras, Usman Naseem

Figure 1 for We Think, Therefore We Align LLMs to Helpful, Harmless and Honest Before They Go Wrong

Figure 2 for We Think, Therefore We Align LLMs to Helpful, Harmless and Honest Before They Go Wrong

Figure 3 for We Think, Therefore We Align LLMs to Helpful, Harmless and Honest Before They Go Wrong

Figure 4 for We Think, Therefore We Align LLMs to Helpful, Harmless and Honest Before They Go Wrong

Abstract:Alignment of Large Language Models (LLMs) along multiple objectives-helpfulness, harmlessness, and honesty (HHH)-is critical for safe and reliable deployment. Prior work has used steering vector-small control signals injected into hidden states-to guide LLM outputs, typically via one-to-one (1-to-1) Transformer decoders. In this setting, optimizing a single alignment objective can inadvertently overwrite representations learned for other objectives, leading to catastrophic forgetting. More recent approaches extend steering vectors via one-to-many (1-to-N) Transformer decoders. While this alleviates catastrophic forgetting, naive multi-branch designs optimize each objective independently, which can cause inference fragmentation-outputs across HHH objectives may become inconsistent. We propose Adaptive Multi-Branch Steering (AMBS), a two-stage 1-to-N framework for unified and efficient multi-objective alignment. In Stage I, post-attention hidden states of the Transformer layer are computed once to form a shared representation. In Stage II, this representation is cloned into parallel branches and steered via a policy-reference mechanism, enabling objective-specific control while maintaining cross-objective consistency. Empirical evaluations on Alpaca, BeaverTails, and TruthfulQA show that AMBS consistently improves HHH alignment across multiple 7B LLM backbones. For example, on DeepSeek-7B, AMBS improves average alignment scores by +32.4% and reduces unsafe outputs by 11.0% compared to a naive 1-to-N baseline, while remaining competitive with state-of-the-art methods.

Via

Access Paper or Ask Questions

MAGIC-Enhanced Keyword Prompting for Zero-Shot Audio Captioning with CLIP Models

Sep 16, 2025

Vijay Govindarajan, Pratik Patel, Sahil Tripathi, Md Azizul Hoque, Gautam Siddharth Kashyap

Abstract:Automated Audio Captioning (AAC) generates captions for audio clips but faces challenges due to limited datasets compared to image captioning. To overcome this, we propose the zero-shot AAC system that leverages pre-trained models, eliminating the need for extensive training. Our approach uses a pre-trained audio CLIP model to extract auditory features and generate a structured prompt, which guides a Large Language Model (LLM) in caption generation. Unlike traditional greedy decoding, our method refines token selection through the audio CLIP model, ensuring alignment with the audio content. Experimental results demonstrate a 35% improvement in NLG mean score (from 4.7 to 7.3) using MAGIC search with the WavCaps model. The performance is heavily influenced by the audio-text matching model and keyword selection, with optimal results achieved using a single keyword prompt, and a 50% performance drop when no keyword list is used.

* Accepted in The 26th International Conference on Web Information Systems Engineering (WISE), scheduled for 15-17 December 2025 in Marrakech, Morocco

Via

Access Paper or Ask Questions

Too Helpful, Too Harmless, Too Honest or Just Right?

Sep 10, 2025

Gautam Siddharth Kashyap, Mark Dras, Usman Naseem

Abstract:Large Language Models (LLMs) exhibit strong performance across a wide range of NLP tasks, yet aligning their outputs with the principles of Helpfulness, Harmlessness, and Honesty (HHH) remains a persistent challenge. Existing methods often optimize for individual alignment dimensions in isolation, leading to trade-offs and inconsistent behavior. While Mixture-of-Experts (MoE) architectures offer modularity, they suffer from poorly calibrated routing, limiting their effectiveness in alignment tasks. We propose TrinityX, a modular alignment framework that incorporates a Mixture of Calibrated Experts (MoCaE) within the Transformer architecture. TrinityX leverages separately trained experts for each HHH dimension, integrating their outputs through a calibrated, task-adaptive routing mechanism that combines expert signals into a unified, alignment-aware representation. Extensive experiments on three standard alignment benchmarks-Alpaca (Helpfulness), BeaverTails (Harmlessness), and TruthfulQA (Honesty)-demonstrate that TrinityX outperforms strong baselines, achieving relative improvements of 32.5% in win rate, 33.9% in safety score, and 28.4% in truthfulness. In addition, TrinityX reduces memory usage and inference latency by over 40% compared to prior MoE-based approaches. Ablation studies highlight the importance of calibrated routing, and cross-model evaluations confirm TrinityX's generalization across diverse LLM backbones.

* EMNLP'25 Main

Via

Access Paper or Ask Questions

Heterogeneity over Homogeneity: Investigating Multilingual Speech Pre-Trained Models for Detecting Audio Deepfake

Mar 31, 2024

Orchid Chetia Phukan, Gautam Siddharth Kashyap, Arun Balaji Buduru, Rajesh Sharma

Figure 1 for Heterogeneity over Homogeneity: Investigating Multilingual Speech Pre-Trained Models for Detecting Audio Deepfake

Figure 2 for Heterogeneity over Homogeneity: Investigating Multilingual Speech Pre-Trained Models for Detecting Audio Deepfake

Figure 3 for Heterogeneity over Homogeneity: Investigating Multilingual Speech Pre-Trained Models for Detecting Audio Deepfake

Figure 4 for Heterogeneity over Homogeneity: Investigating Multilingual Speech Pre-Trained Models for Detecting Audio Deepfake

Abstract:In this work, we investigate multilingual speech Pre-Trained models (PTMs) for Audio deepfake detection (ADD). We hypothesize that multilingual PTMs trained on large-scale diverse multilingual data gain knowledge about diverse pitches, accents, and tones, during their pre-training phase and making them more robust to variations. As a result, they will be more effective for detecting audio deepfakes. To validate our hypothesis, we extract representations from state-of-the-art (SOTA) PTMs including monolingual, multilingual as well as PTMs trained for speaker and emotion recognition, and evaluated them on ASVSpoof 2019 (ASV), In-the-Wild (ITW), and DECRO benchmark databases. We show that representations from multilingual PTMs, with simple downstream networks, attain the best performance for ADD compared to other PTM representations, which validates our hypothesis. We also explore the possibility of fusion of selected PTM representations for further improvements in ADD, and we propose a framework, MiO (Merge into One) for this purpose. With MiO, we achieve SOTA performance on ASV and ITW and comparable performance on DECRO with current SOTA works.

* Accepted to NAACL (Findings) 2024

Via

Access Paper or Ask Questions

From Text to Transformation: A Comprehensive Review of Large Language Models' Versatility

Feb 25, 2024

Pravneet Kaur, Gautam Siddharth Kashyap, Ankit Kumar, Md Tabrez Nafis, Sandeep Kumar, Vikrant Shokeen

Figure 1 for From Text to Transformation: A Comprehensive Review of Large Language Models' Versatility

Figure 2 for From Text to Transformation: A Comprehensive Review of Large Language Models' Versatility

Figure 3 for From Text to Transformation: A Comprehensive Review of Large Language Models' Versatility

Figure 4 for From Text to Transformation: A Comprehensive Review of Large Language Models' Versatility

Abstract:This groundbreaking study explores the expanse of Large Language Models (LLMs), such as Generative Pre-Trained Transformer (GPT) and Bidirectional Encoder Representations from Transformers (BERT) across varied domains ranging from technology, finance, healthcare to education. Despite their established prowess in Natural Language Processing (NLP), these LLMs have not been systematically examined for their impact on domains such as fitness, and holistic well-being, urban planning, climate modelling as well as disaster management. This review paper, in addition to furnishing a comprehensive analysis of the vast expanse and extent of LLMs' utility in diverse domains, recognizes the research gaps and realms where the potential of LLMs is yet to be harnessed. This study uncovers innovative ways in which LLMs can leave a mark in the fields like fitness and wellbeing, urban planning, climate modelling and disaster response which could inspire future researches and applications in the said avenues.

Via

Access Paper or Ask Questions

How Paralingual are Paralinguistic Representations? A Case Study in Speech Emotion Recognition

Feb 02, 2024

Orchid Chetia Phukan, Gautam Siddharth Kashyap, Arun Balaji Buduru, Rajesh Sharma

Figure 1 for How Paralingual are Paralinguistic Representations? A Case Study in Speech Emotion Recognition

Figure 2 for How Paralingual are Paralinguistic Representations? A Case Study in Speech Emotion Recognition

Figure 3 for How Paralingual are Paralinguistic Representations? A Case Study in Speech Emotion Recognition

Figure 4 for How Paralingual are Paralinguistic Representations? A Case Study in Speech Emotion Recognition

Abstract:Pre-trained Models (PTMs) have facilitated substantial progress in the field of Speech Emotion Recognition (SER). SER is an area with applications ranging from HumanComputer Interaction to Healthcare. Recent studies have leveraged various PTM representations as input features for downstream models for SER. PTM specifically pre-trained for paralinguistic tasks have obtained state-of-the-art (SOTA) performance for SER. However, such PTM haven't been evaluated for SER in multilingual settings and experimented only with English. So, we fill this gap, by performing a comprehensive comparative study of five PTMs (TRILLsson, wav2vec2, XLS-R, x-vector, Whisper) for assessing the effectiveness of paralingual PTM (TRILLsson) for SER across multiple languages. Representations from TRILLsson achieved the best performance among all the PTMs. This demonstrates that TRILLsson is able to effectively capture the various paralinguistic features from speech data for better SER. We also show that downstream models using TRILLsson representations achieve SOTA performance in terms of accuracy across various multi-lingual datasets.

Via

Access Paper or Ask Questions

Detection of a facemask in real-time using deep learning methods: Prevention of Covid 19

Jan 28, 2024

Gautam Siddharth Kashyap, Jatin Sohlot, Ayesha Siddiqui, Ramsha Siddiqui, Karan Malik, Samar Wazir, Alexander E. I. Brownlee

Figure 1 for Detection of a facemask in real-time using deep learning methods: Prevention of Covid 19

Figure 2 for Detection of a facemask in real-time using deep learning methods: Prevention of Covid 19

Figure 3 for Detection of a facemask in real-time using deep learning methods: Prevention of Covid 19

Figure 4 for Detection of a facemask in real-time using deep learning methods: Prevention of Covid 19

Abstract:A health crisis is raging all over the world with the rapid transmission of the novel-coronavirus disease (Covid-19). Out of the guidelines issued by the World Health Organisation (WHO) to protect us against Covid-19, wearing a facemask is the most effective. Many countries have necessitated the wearing of face masks, but monitoring a large number of people to ensure that they are wearing masks in a crowded place is a challenging task in itself. The novel-coronavirus disease (Covid-19) has already affected our day-to-day life as well as world trade movements. By the end of April 2021, the world has recorded 144,358,956 confirmed cases of novel-coronavirus disease (Covid-19) including 3,066,113 deaths according to the world health organization (WHO). These increasing numbers motivate automated techniques for the detection of a facemask in real-time scenarios for the prevention of Covid-19. We propose a technique using deep learning that works for single and multiple people in a frame recorded via webcam in still or in motion. We have also experimented with our approach in night light. The accuracy of our model is good compared to the other approaches in the literature; ranging from 74% for multiple people in a nightlight to 99% for a single person in daylight.

* Research Advances in Network Technologies (Volume 2) (CRC Press Taylor and Francis), 2023 (Accepted)

Via

Access Paper or Ask Questions

From Simulations to Reality: Enhancing Multi-Robot Exploration for Urban Search and Rescue

Nov 28, 2023

Gautam Siddharth Kashyap, Deepkashi Mahajan, Orchid Chetia Phukan, Ankit Kumar, Alexander E. I. Brownlee, Jiechao Gao

Figure 1 for From Simulations to Reality: Enhancing Multi-Robot Exploration for Urban Search and Rescue

Figure 2 for From Simulations to Reality: Enhancing Multi-Robot Exploration for Urban Search and Rescue

Figure 3 for From Simulations to Reality: Enhancing Multi-Robot Exploration for Urban Search and Rescue

Figure 4 for From Simulations to Reality: Enhancing Multi-Robot Exploration for Urban Search and Rescue

Abstract:In this study, we present a novel hybrid algorithm, combining Levy Flight (LF) and Particle Swarm Optimization (PSO) (LF-PSO), tailored for efficient multi-robot exploration in unknown environments with limited communication and no global positioning information. The research addresses the growing interest in employing multiple autonomous robots for exploration tasks, particularly in scenarios such as Urban Search and Rescue (USAR) operations. Multiple robots offer advantages like increased task coverage, robustness, flexibility, and scalability. However, existing approaches often make assumptions such as search area, robot positioning, communication restrictions, and target information that may not hold in real-world situations. The hybrid algorithm leverages LF, known for its effectiveness in large space exploration with sparse targets, and incorporates inter-robot repulsion as a social component through PSO. This combination enhances area exploration efficiency. We redefine the local best and global best positions to suit scenarios without continuous target information. Experimental simulations in a controlled environment demonstrate the algorithm's effectiveness, showcasing improved area coverage compared to traditional methods. In the process of refining our approach and testing it in complex, obstacle-rich environments, the presented work holds promise for enhancing multi-robot exploration in scenarios with limited information and communication capabilities.

Via

Access Paper or Ask Questions

MLOps: A Review

Aug 19, 2023

Samar Wazir, Gautam Siddharth Kashyap, Parag Saxena

Abstract:Recently, Machine Learning (ML) has become a widely accepted method for significant progress that is rapidly evolving. Since it employs computational methods to teach machines and produce acceptable answers. The significance of the Machine Learning Operations (MLOps) methods, which can provide acceptable answers for such problems, is examined in this study. To assist in the creation of software that is simple to use, the authors research MLOps methods. To choose the best tool structure for certain projects, the authors also assess the features and operability of various MLOps methods. A total of 22 papers were assessed that attempted to apply the MLOps idea. Finally, the authors admit the scarcity of fully effective MLOps methods based on which advancements can self-regulate by limiting human engagement.

Via

Access Paper or Ask Questions

Roulette-Wheel Selection-Based PSO Algorithm for Solving the Vehicle Routing Problem with Time Windows

Jun 04, 2023

Gautam Siddharth Kashyap, Alexander E. I. Brownlee, Orchid Chetia Phukan, Karan Malik, Samar Wazir

Figure 1 for Roulette-Wheel Selection-Based PSO Algorithm for Solving the Vehicle Routing Problem with Time Windows

Figure 2 for Roulette-Wheel Selection-Based PSO Algorithm for Solving the Vehicle Routing Problem with Time Windows

Figure 3 for Roulette-Wheel Selection-Based PSO Algorithm for Solving the Vehicle Routing Problem with Time Windows

Figure 4 for Roulette-Wheel Selection-Based PSO Algorithm for Solving the Vehicle Routing Problem with Time Windows

Abstract:The well-known Vehicle Routing Problem with Time Windows (VRPTW) aims to reduce the cost of moving goods between several destinations while accommodating constraints like set time windows for certain locations and vehicle capacity. Applications of the VRPTW problem in the real world include Supply Chain Management (SCM) and logistic dispatching, both of which are crucial to the economy and are expanding quickly as work habits change. Therefore, to solve the VRPTW problem, metaheuristic algorithms i.e. Particle Swarm Optimization (PSO) have been found to work effectively, however, they can experience premature convergence. To lower the risk of PSO's premature convergence, the authors have solved VRPTW in this paper utilising a novel form of the PSO methodology that uses the Roulette Wheel Method (RWPSO). Computing experiments using the Solomon VRPTW benchmark datasets on the RWPSO demonstrate that RWPSO is competitive with other state-of-the-art algorithms from the literature. Also, comparisons with two cutting-edge algorithms from the literature show how competitive the suggested algorithm is.

Via

Access Paper or Ask Questions