Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sahil Kumar

MambaVoiceCloning: Efficient and Expressive Text-to-Speech via State-Space Modeling and Diffusion Control

Mar 31, 2026

Sahil Kumar, Namrataben Patel, Honggang Wang, Youshan Zhang

Abstract:MambaVoiceCloning (MVC) asks whether the conditioning path of diffusion-based TTS can be made fully SSM-only at inference, removing all attention and explicit RNN-style recurrence layers across text, rhythm, and prosody, while preserving or improving quality under controlled conditions. MVC combines a gated bidirectional Mamba text encoder, a Temporal Bi-Mamba supervised by a lightweight alignment teacher discarded after training, and an Expressive Mamba with AdaLN modulation, yielding linear-time O(T) conditioning with bounded activation memory and practical finite look-ahead streaming. Unlike prior Mamba-TTS systems that remain hybrid at inference, MVC removes attention-based duration and style modules under a fixed StyleTTS2 mel-diffusion-vocoder backbone. Trained on LJSpeech/LibriTTS and evaluated on VCTK, CSS10 (ES/DE/FR), and long-form Gutenberg passages, MVC achieves modest but statistically reliable gains over StyleTTS2, VITS, and Mamba-attention hybrids in MOS/CMOS, F0 RMSE, MCD, and WER, while reducing encoder parameters to 21M and improving throughput by 1.6x. Diffusion remains the dominant latency source, but SSM-only conditioning improves memory footprint, stability, and deployability.

* Accepted at ICLR 2026

Via

Access Paper or Ask Questions

A Decomposable Forward Process in Diffusion Models for Time-Series Forecasting

Jan 29, 2026

Francisco Caldas, Sahil Kumar, Cláudia Soares

Abstract:We introduce a model-agnostic forward diffusion process for time-series forecasting that decomposes signals into spectral components, preserving structured temporal patterns such as seasonality more effectively than standard diffusion. Unlike prior work that modifies the network architecture or diffuses directly in the frequency domain, our proposed method alters only the diffusion process itself, making it compatible with existing diffusion backbones (e.g., DiffWave, TimeGrad, CSDI). By staging noise injection according to component energy, it maintains high signal-to-noise ratios for dominant frequencies throughout the diffusion trajectory, thereby improving the recoverability of long-term patterns. This strategy enables the model to maintain the signal structure for a longer period in the forward process, leading to improved forecast quality. Across standard forecasting benchmarks, we show that applying spectral decomposition strategies, such as the Fourier or Wavelet transform, consistently improves upon diffusion models using the baseline forward process, with negligible computational overhead. The code for this paper is available at https://anonymous.4open.science/r/D-FDP-4A29.

* submitted to ICML'26

Via

Access Paper or Ask Questions

KatzBot: Revolutionizing Academic Chatbot for Enhanced Communication

Oct 21, 2024

Sahil Kumar, Deepa Paikar, Kiran Sai Vutukuri, Haider Ali, Shashidhar Reddy Ainala, Aditya Murli Krishnan, Youshan Zhang

Figure 1 for KatzBot: Revolutionizing Academic Chatbot for Enhanced Communication

Figure 2 for KatzBot: Revolutionizing Academic Chatbot for Enhanced Communication

Figure 3 for KatzBot: Revolutionizing Academic Chatbot for Enhanced Communication

Figure 4 for KatzBot: Revolutionizing Academic Chatbot for Enhanced Communication

Abstract:Effective communication within universities is crucial for addressing the diverse information needs of students, alumni, and external stakeholders. However, existing chatbot systems often fail to deliver accurate, context-specific responses, resulting in poor user experiences. In this paper, we present KatzBot, an innovative chatbot powered by KatzGPT, a custom Large Language Model (LLM) fine-tuned on domain-specific academic data. KatzGPT is trained on two university-specific datasets: 6,280 sentence-completion pairs and 7,330 question-answer pairs. KatzBot outperforms established existing open source LLMs, achieving higher accuracy and domain relevance. KatzBot offers a user-friendly interface, significantly enhancing user satisfaction in real-world applications. The source code is publicly available at \url{https://github.com/AiAI-99/katzbot}.

Via

Access Paper or Ask Questions

Vision Transformer Segmentation for Visual Bird Sound Denoising

Jun 13, 2024

Sahil Kumar, Jialu Li, Youshan Zhang

Figure 1 for Vision Transformer Segmentation for Visual Bird Sound Denoising

Figure 2 for Vision Transformer Segmentation for Visual Bird Sound Denoising

Figure 3 for Vision Transformer Segmentation for Visual Bird Sound Denoising

Abstract:Audio denoising, especially in the context of bird sounds, remains a challenging task due to persistent residual noise. Traditional and deep learning methods often struggle with artificial or low-frequency noise. In this work, we propose ViTVS, a novel approach that leverages the power of the vision transformer (ViT) architecture. ViTVS adeptly combines segmentation techniques to disentangle clean audio from complex signal mixtures. Our key contributions encompass the development of ViTVS, introducing comprehensive, long-range, and multi-scale representations. These contributions directly tackle the limitations inherent in conventional approaches. Extensive experiments demonstrate that ViTVS outperforms state-of-the-art methods, positioning it as a benchmark solution for real-world bird sound denoising applications. Source code is available at: https://github.com/aiai-4/ViVTS.

* INTERSPEECH 2024

Via

Access Paper or Ask Questions

Comparative Study of MPPT and Parameter Estimation of PV cells

Apr 16, 2023

Sahil Kumar, Sahitya Gupta, Vajayant Pratik, Pascal Brunet

Figure 1 for Comparative Study of MPPT and Parameter Estimation of PV cells

Figure 2 for Comparative Study of MPPT and Parameter Estimation of PV cells

Figure 3 for Comparative Study of MPPT and Parameter Estimation of PV cells

Figure 4 for Comparative Study of MPPT and Parameter Estimation of PV cells

Abstract:The presented work focuses on utilising machine learning techniques to accurately estimate accurate values for known and unknown parameters of the PVLIB model for solar cells and photovoltaic modules.Finding accurate model parameters of circuits for photovoltaic (PV) cells is important for a variety of tasks. An Artificial Neural Network (ANN) algorithm was employed, which outperformed other metaheuristic and machine learning algorithms in terms of computational efficiency. To validate the consistency of the data and output, the results were compared against other machine learning algorithms based on irradiance and temperature. A Bland Altman test was conducted that resulted in more than 95 percent accuracy rate. Upon validation, the ANN algorithm was utilised to estimate the parameters and their respective values.

Via

Access Paper or Ask Questions