Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rahul Raman

Rethinking the Outlier Distribution in Large Language Models: An In-depth Study

May 27, 2025

Rahul Raman, Khushi Sharma, Sai Qian Zhang

Figure 1 for Rethinking the Outlier Distribution in Large Language Models: An In-depth Study

Figure 2 for Rethinking the Outlier Distribution in Large Language Models: An In-depth Study

Figure 3 for Rethinking the Outlier Distribution in Large Language Models: An In-depth Study

Figure 4 for Rethinking the Outlier Distribution in Large Language Models: An In-depth Study

Abstract:Investigating outliers in large language models (LLMs) is crucial due to their significant impact on various aspects of LLM performance, including quantization and compression. Outliers often cause considerable quantization errors, leading to degraded model performance. Identifying and addressing these outliers can enhance the accuracy and efficiency of the quantization process, enabling smoother deployment on edge devices or specialized hardware. Recent studies have identified two common types of outliers in LLMs: massive activations and channel-wise outliers. While numerous quantization algorithms have been proposed to mitigate their effects and maintain satisfactory accuracy, few have thoroughly explored the root causes of these outliers in depth. In this paper, we conduct a comprehensive investigation into the formation mechanisms of these outliers and propose potential strategies to mitigate their occurrence. Ultimately, we introduce some efficient approaches to eliminate most massive activations and channel-wise outliers with minimal impact on accuracy.

Via

Access Paper or Ask Questions

DREAM: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding

May 25, 2025

Yunhai Hu, Tianhua Xia, Zining Liu, Rahul Raman, Xingyu Liu, Bo Bao, Eric Sather, Vithursan Thangarasa, Sai Qian Zhang

Abstract:Speculative decoding (SD) has emerged as a powerful method for accelerating autoregressive generation in large language models (LLMs), yet its integration into vision-language models (VLMs) remains underexplored. We introduce DREAM, a novel speculative decoding framework tailored for VLMs that combines three key innovations: (1) a cross-attention-based mechanism to inject intermediate features from the target model into the draft model for improved alignment, (2) adaptive intermediate feature selection based on attention entropy to guide efficient draft model training, and (3) visual token compression to reduce draft model latency. DREAM enables efficient, accurate, and parallel multimodal decoding with significant throughput improvement. Experiments across a diverse set of recent popular VLMs, including LLaVA, Pixtral, SmolVLM and Gemma3, demonstrate up to 3.6x speedup over conventional decoding and significantly outperform prior SD baselines in both inference throughput and speculative draft acceptance length across a broad range of multimodal benchmarks. The code is publicly available at: https://github.com/SAI-Lab-NYU/DREAM.git

Via

Access Paper or Ask Questions

Exploring Linguistic Similarity and Zero-Shot Learning for Multilingual Translation of Dravidian Languages

Aug 10, 2023

Danish Ebadulla, Rahul Raman, S. Natarajan, Hridhay Kiran Shetty, Ashish Harish Shenoy

Figure 1 for Exploring Linguistic Similarity and Zero-Shot Learning for Multilingual Translation of Dravidian Languages

Figure 2 for Exploring Linguistic Similarity and Zero-Shot Learning for Multilingual Translation of Dravidian Languages

Figure 3 for Exploring Linguistic Similarity and Zero-Shot Learning for Multilingual Translation of Dravidian Languages

Figure 4 for Exploring Linguistic Similarity and Zero-Shot Learning for Multilingual Translation of Dravidian Languages

Abstract:Current research in zero-shot translation is plagued by several issues such as high compute requirements, increased training time and off target translations. Proposed remedies often come at the cost of additional data or compute requirements. Pivot based neural machine translation is preferred over a single-encoder model for most settings despite the increased training and evaluation time. In this work, we overcome the shortcomings of zero-shot translation by taking advantage of transliteration and linguistic similarity. We build a single encoder-decoder neural machine translation system for Dravidian-Dravidian multilingual translation and perform zero-shot translation. We compare the data vs zero-shot accuracy tradeoff and evaluate the performance of our vanilla method against the current state of the art pivot based method. We also test the theory that morphologically rich languages require large vocabularies by restricting the vocabulary using an optimal transport based technique. Our model manages to achieves scores within 3 BLEU of large-scale pivot-based models when it is trained on 50\% of the language directions.

Via

Access Paper or Ask Questions