Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Saransh Sharma

IDALC: A Semi-Supervised Framework for Intent Detection and Active Learning based Correction

Nov 08, 2025

Ankan Mullick, Sukannya Purkayastha, Saransh Sharma, Pawan Goyal, Niloy Ganguly

Figure 1 for IDALC: A Semi-Supervised Framework for Intent Detection and Active Learning based Correction

Figure 2 for IDALC: A Semi-Supervised Framework for Intent Detection and Active Learning based Correction

Figure 3 for IDALC: A Semi-Supervised Framework for Intent Detection and Active Learning based Correction

Figure 4 for IDALC: A Semi-Supervised Framework for Intent Detection and Active Learning based Correction

Abstract:Voice-controlled dialog systems have become immensely popular due to their ability to perform a wide range of actions in response to diverse user queries. These agents possess a predefined set of skills or intents to fulfill specific user tasks. But every system has its own limitations. There are instances where, even for known intents, if any model exhibits low confidence, it results in rejection of utterances that necessitate manual annotation. Additionally, as time progresses, there may be a need to retrain these agents with new intents from the system-rejected queries to carry out additional tasks. Labeling all these emerging intents and rejected utterances over time is impractical, thus calling for an efficient mechanism to reduce annotation costs. In this paper, we introduce IDALC (Intent Detection and Active Learning based Correction), a semi-supervised framework designed to detect user intents and rectify system-rejected utterances while minimizing the need for human annotation. Empirical findings on various benchmark datasets demonstrate that our system surpasses baseline methods, achieving a 5-10% higher accuracy and a 4-8% improvement in macro-F1. Remarkably, we maintain the overall annotation cost at just 6-10% of the unlabelled data available to the system. The overall framework of IDALC is shown in Fig. 1

* IEEE Transactions on Artificial Intelligence, October 2025
* Paper accepted in IEEE Transactions on Artificial Intelligence (October 2025)

Via

Access Paper or Ask Questions

FiRST: Finetuning Router-Selective Transformers for Input-Adaptive Latency Reduction

Oct 16, 2024

Akriti Jain, Saransh Sharma, Koyel Mukherjee, Soumyabrata Pal

Figure 1 for FiRST: Finetuning Router-Selective Transformers for Input-Adaptive Latency Reduction

Figure 2 for FiRST: Finetuning Router-Selective Transformers for Input-Adaptive Latency Reduction

Figure 3 for FiRST: Finetuning Router-Selective Transformers for Input-Adaptive Latency Reduction

Figure 4 for FiRST: Finetuning Router-Selective Transformers for Input-Adaptive Latency Reduction

Abstract:Auto-regressive Large Language Models (LLMs) demonstrate remarkable performance across domanins such as vision and language processing. However, due to sequential processing through a stack of transformer layers, autoregressive decoding faces significant computation/latency challenges, particularly in resource constrained environments like mobile and edge devices. Existing approaches in literature that aim to improve latency via skipping layers have two distinct flavors - 1) Early exit 2) Input-agnostic heuristics where tokens exit at pre-determined layers irrespective of input sequence. Both the above strategies have limitations - the former cannot be applied to handle KV Caching necessary for speed-ups in modern framework and the latter does not capture the variation in layer importance across tasks or more generally, across input sequences. To address both limitations, we propose FIRST, an algorithm that reduces inference latency by using layer-specific routers to select a subset of transformer layers adaptively for each input sequence - the prompt (during prefill stage) decides which layers will be skipped during decoding. FIRST preserves compatibility with KV caching enabling faster inference while being quality-aware. FIRST is model-agnostic and can be easily enabled on any pre-trained LLM. We further improve performance by incorporating LoRA adapters for fine-tuning on external datasets, enhancing task-specific accuracy while maintaining latency benefits. Our approach reveals that input adaptivity is critical - indeed, different task-specific middle layers play a crucial role in evolving hidden representations depending on task. Extensive experiments show that FIRST significantly reduces latency while retaining competitive performance (as compared to baselines), making our approach an efficient solution for LLM deployment in low-resource environments.

* 17 pages, 6 figures, Submitted to ICLR 2025

Via

Access Paper or Ask Questions