Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Deepak Kumar

How Far Can Pretrained LLMs Go in Symbolic Music? Controlled Comparisons of Supervised and Preference-based Adaptation

Jan 30, 2026

Deepak Kumar, Emmanouil Karystinaios, Gerhard Widmer, Markus Schedl

Abstract:Music often shares notable parallels with language, motivating the use of pretrained large language models (LLMs) for symbolic music understanding and generation. Despite growing interest, the practical effectiveness of adapting instruction-tuned LLMs to symbolic music remains insufficiently characterized. We present a controlled comparative study of finetuning strategies for ABC-based generation and understanding, comparing an off-the-shelf instruction-tuned backbone to domain-adapted variants and a music-specialized LLM baseline. Across multiple symbolic music corpora and evaluation signals, we provide some insights into adaptation choices for symbolic music applications. We highlight the domain adaptation vs.~preserving prior information tradeoff as well as the distinct behaviour of metrics used to measure the domain adaptation for symbolic music.

* Accepted at NLP4MusA 2026

Via

Access Paper or Ask Questions

Taming Toxic Talk: Using chatbots to intervene with users posting toxic comments

Jan 27, 2026

Jeremy Foote, Deepak Kumar, Bedadyuti Jha, Ryan Funkhouser, Loizos Bitsikokos, Hitesh Goel, Hsuen-Chi Chiu

Abstract:Generative AI chatbots have proven surprisingly effective at persuading people to change their beliefs and attitudes in lab settings. However, the practical implications of these findings are not yet clear. In this work, we explore the impact of rehabilitative conversations with generative AI chatbots on users who share toxic content online. Toxic behaviors -- like insults or threats of violence, are widespread in online communities. Strategies to deal with toxic behavior are typically punitive, such as removing content or banning users. Rehabilitative approaches are rarely attempted, in part due to the emotional and psychological cost of engaging with aggressive users. In collaboration with seven large Reddit communities, we conducted a large-scale field experiment (N=893) to invite people who had recently posted toxic content to participate in conversations with AI chatbots. A qualitative analysis of the conversations shows that many participants engaged in good faith and even expressed remorse or a desire to change. However, we did not observe a significant change in toxic behavior in the following month compared to a control group. We discuss possible explanations for our findings, as well as theoretical and practical implications based on our results.

Via

Access Paper or Ask Questions

Use of Metric Learning for the Recognition of Handwritten Digits, and its Application to Increase the Outreach of Voice-based Communication Platforms

Apr 26, 2025

Devesh Pant, Dibyendu Talukder, Deepak Kumar, Rachit Pandey, Aaditeshwar Seth, Chetan Arora

Figure 1 for Use of Metric Learning for the Recognition of Handwritten Digits, and its Application to Increase the Outreach of Voice-based Communication Platforms

Figure 2 for Use of Metric Learning for the Recognition of Handwritten Digits, and its Application to Increase the Outreach of Voice-based Communication Platforms

Figure 3 for Use of Metric Learning for the Recognition of Handwritten Digits, and its Application to Increase the Outreach of Voice-based Communication Platforms

Figure 4 for Use of Metric Learning for the Recognition of Handwritten Digits, and its Application to Increase the Outreach of Voice-based Communication Platforms

Abstract:Initiation, monitoring, and evaluation of development programmes can involve field-based data collection about project activities. This data collection through digital devices may not always be feasible though, for reasons such as unaffordability of smartphones and tablets by field-based cadre, or shortfalls in their training and capacity building. Paper-based data collection has been argued to be more appropriate in several contexts, with automated digitization of the paper forms through OCR (Optical Character Recognition) and OMR (Optical Mark Recognition) techniques. We contribute with providing a large dataset of handwritten digits, and deep learning based models and methods built using this data, that are effective in real-world environments. We demonstrate the deployment of these tools in the context of a maternal and child health and nutrition awareness project, which uses IVR (Interactive Voice Response) systems to provide awareness information to rural women SHG (Self Help Group) members in north India. Paper forms were used to collect phone numbers of the SHG members at scale, which were digitized using the OCR tools developed by us, and used to push almost 4 million phone calls. The data, model, and code have been released in the open-source domain.

* COMPASS 2022: Proceedings of the 5th ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies COMPASS '22: Proceedings of the 5th ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies, Pages 364 - 374
* 10 Pages, 7 Figures, ACM COMPASS 2022

Via

Access Paper or Ask Questions

Explainable, Multi-modal Wound Infection Classification from Images Augmented with Generated Captions

Feb 27, 2025

Palawat Busaranuvong, Emmanuel Agu, Reza Saadati Fard, Deepak Kumar, Shefalika Gautam, Bengisu Tulu, Diane Strong

Abstract:Infections in Diabetic Foot Ulcers (DFUs) can cause severe complications, including tissue death and limb amputation, highlighting the need for accurate, timely diagnosis. Previous machine learning methods have focused on identifying infections by analyzing wound images alone, without utilizing additional metadata such as medical notes. In this study, we aim to improve infection detection by introducing Synthetic Caption Augmented Retrieval for Wound Infection Detection (SCARWID), a novel deep learning framework that leverages synthetic textual descriptions to augment DFU images. SCARWID consists of two components: (1) Wound-BLIP, a Vision-Language Model (VLM) fine-tuned on GPT-4o-generated descriptions to synthesize consistent captions from images; and (2) an Image-Text Fusion module that uses cross-attention to extract cross-modal embeddings from an image and its corresponding Wound-BLIP caption. Infection status is determined by retrieving the top-k similar items from a labeled support set. To enhance the diversity of training data, we utilized a latent diffusion model to generate additional wound images. As a result, SCARWID outperformed state-of-the-art models, achieving average sensitivity, specificity, and accuracy of 0.85, 0.78, and 0.81, respectively, for wound infection classification. Displaying the generated captions alongside the wound images and infection detection results enhances interpretability and trust, enabling nurses to align SCARWID outputs with their medical knowledge. This is particularly valuable when wound notes are unavailable or when assisting novice nurses who may find it difficult to identify visual attributes of wound infection.

Via

Access Paper or Ask Questions

Uplink Rate Splitting Multiple Access with Imperfect Channel State Information and Interference Cancellation

Jan 31, 2025

Farjam Karim, Nurul Huda Mahmood, Arthur S. de Sena, Deepak Kumar, Bruno Clerckx, Matti Latva-aho

Figure 1 for Uplink Rate Splitting Multiple Access with Imperfect Channel State Information and Interference Cancellation

Figure 2 for Uplink Rate Splitting Multiple Access with Imperfect Channel State Information and Interference Cancellation

Figure 3 for Uplink Rate Splitting Multiple Access with Imperfect Channel State Information and Interference Cancellation

Abstract:This article investigates the performance of uplink rate splitting multiple access (RSMA) in a two-user scenario, addressing an under-explored domain compared to its downlink counterpart. With the increasing demand for uplink communication in applications like the Internet-of-Things, it is essential to account for practical imperfections, such as inaccuracies in channel state information at the receiver (CSIR) and limitations in successive interference cancellation (SIC), to provide realistic assessments of system performance. Specifically, we derive closed-form expressions for the outage probability, throughput, and asymptotic outage behavior of uplink users, considering imperfect CSIR and SIC. We validate the accuracy of these derived expressions using Monte Carlo simulations. Our findings reveal that at low transmit power levels, imperfect CSIR significantly affects system performance more severely than SIC imperfections. However, as the transmit power increases, the impact of imperfect CSIR diminishes, while the influence of SIC imperfections becomes more pronounced. Moreover, we highlight the impact of the rate allocation factor on user performance. Finally, our comparison with non-orthogonal multiple access (NOMA) highlights the outage performance trade-offs between RSMA and NOMA. RSMA proves to be more effective in managing imperfect CSIR and enhances performance through strategic message splitting, resulting in more robust communication.

Via

Access Paper or Ask Questions

Pretraining Data and Tokenizer for Indic LLM

Jul 17, 2024

Rahul Kumar, Shubham Kakde, Divyansh Rajput, Daud Ibrahim, Rishabh Nahata, Pidathala Sowjanya, Deepak Kumar

Abstract:We present a novel approach to data preparation for developing multilingual Indic large language model. Our meticulous data acquisition spans open-source and proprietary sources, including Common Crawl, Indic books, news articles, and Wikipedia, ensuring a diverse and rich linguistic representation. For each Indic language, we design a custom preprocessing pipeline to effectively eliminate redundant and low-quality text content. Additionally, we perform deduplication on Common Crawl data to address the redundancy present in 70% of the crawled web pages. This study focuses on developing high-quality data, optimizing tokenization for our multilingual dataset for Indic large language models with 3B and 7B parameters, engineered for superior performance in Indic languages. We introduce a novel multilingual tokenizer training strategy, demonstrating our custom-trained Indic tokenizer outperforms the state-of-the-art OpenAI Tiktoken tokenizer, achieving a superior token-to-word ratio for Indic languages.

Via

Access Paper or Ask Questions

End-to-End Waveform and Beamforming Optimization for RF Wireless Power Transfer

May 09, 2024

Abdul Basit Khattak, Onel L. A. López, Amirhossein Azarbahram, Deepak Kumar, Matti Latva-aho

Figure 1 for End-to-End Waveform and Beamforming Optimization for RF Wireless Power Transfer

Figure 2 for End-to-End Waveform and Beamforming Optimization for RF Wireless Power Transfer

Figure 3 for End-to-End Waveform and Beamforming Optimization for RF Wireless Power Transfer

Figure 4 for End-to-End Waveform and Beamforming Optimization for RF Wireless Power Transfer

Abstract:Radio frequency (RF) wireless power transfer (WPT) is a key technology for future low-power wireless systems. However, the inherently low end-to-end power transfer efficiency (PTE) is challenging for practical applications. The main factors contributing to it are the channel losses, transceivers' power consumption, and losses related, e.g., to the digital-to-analog converter (DAC), high-power amplifier, and rectenna. Optimizing PTE requires careful consideration of these factors, motivating the current work. Herein, we consider an analog multi-antenna power transmitter that aims to charge a single energy harvester. We first provide a mathematical framework to calculate the harvested power from multi-tone signal transmissions and the system power consumption. Then, we formulate the joint waveform and analog beamforming design problem to minimize power consumption and meet the charging requirements. Finally, we propose an optimization approach relying on swarm intelligence to solve the specified problem. Simulation results quantify the power consumption reduction as the DAC, phase shifters resolution, and antenna length are increased, while it is seen that increasing system frequency results in higher power consumption.

* Conference

Via

Access Paper or Ask Questions

Guided Conditional Diffusion Classifier (ConDiff) for Enhanced Prediction of Infection in Diabetic Foot Ulcers

May 01, 2024

Palawat Busaranuvong, Emmanuel Agu, Deepak Kumar, Shefalika Gautam, Reza Saadati Fard, Bengisu Tulu, Diane Strong

Abstract:To detect infected wounds in Diabetic Foot Ulcers (DFUs) from photographs, preventing severe complications and amputations. Methods: This paper proposes the Guided Conditional Diffusion Classifier (ConDiff), a novel deep-learning infection detection model that combines guided image synthesis with a denoising diffusion model and distance-based classification. The process involves (1) generating guided conditional synthetic images by injecting Gaussian noise to a guide image, followed by denoising the noise-perturbed image through a reverse diffusion process, conditioned on infection status and (2) classifying infections based on the minimum Euclidean distance between synthesized images and the original guide image in embedding space. Results: ConDiff demonstrated superior performance with an accuracy of 83% and an F1-score of 0.858, outperforming state-of-the-art models by at least 3%. The use of a triplet loss function reduces overfitting in the distance-based classifier. Conclusions: ConDiff not only enhances diagnostic accuracy for DFU infections but also pioneers the use of generative discriminative models for detailed medical image analysis, offering a promising approach for improving patient outcomes.

Via

Access Paper or Ask Questions

Watch Your Language: Large Language Models and Content Moderation

Sep 25, 2023

Deepak Kumar, Yousef AbuHashem, Zakir Durumeric

Figure 1 for Watch Your Language: Large Language Models and Content Moderation

Figure 2 for Watch Your Language: Large Language Models and Content Moderation

Figure 3 for Watch Your Language: Large Language Models and Content Moderation

Figure 4 for Watch Your Language: Large Language Models and Content Moderation

Abstract:Large language models (LLMs) have exploded in popularity due to their ability to perform a wide array of natural language tasks. Text-based content moderation is one LLM use case that has received recent enthusiasm, however, there is little research investigating how LLMs perform in content moderation settings. In this work, we evaluate a suite of modern, commercial LLMs (GPT-3, GPT-3.5, GPT-4) on two common content moderation tasks: rule-based community moderation and toxic content detection. For rule-based community moderation, we construct 95 LLM moderation-engines prompted with rules from 95 Reddit subcommunities and find that LLMs can be effective at rule-based moderation for many communities, achieving a median accuracy of 64% and a median precision of 83%. For toxicity detection, we find that LLMs significantly outperform existing commercially available toxicity classifiers. However, we also find that recent increases in model size add only marginal benefit to toxicity detection, suggesting a potential performance plateau for LLMs on toxicity detection tasks. We conclude by outlining avenues for future work in studying LLMs and content moderation.

Via

Access Paper or Ask Questions

Specious Sites: Tracking the Spread and Sway of Spurious News Stories at Scale

Aug 03, 2023

Hans W. A. Hanley, Deepak Kumar, Zakir Durumeric

Figure 1 for Specious Sites: Tracking the Spread and Sway of Spurious News Stories at Scale

Figure 2 for Specious Sites: Tracking the Spread and Sway of Spurious News Stories at Scale

Figure 3 for Specious Sites: Tracking the Spread and Sway of Spurious News Stories at Scale

Figure 4 for Specious Sites: Tracking the Spread and Sway of Spurious News Stories at Scale

Abstract:Misinformation, propaganda, and outright lies proliferate on the web, with some narratives having dangerous real-world consequences on public health, elections, and individual safety. However, despite the impact of misinformation, the research community largely lacks automated and programmatic approaches for tracking news narratives across online platforms. In this work, utilizing daily scrapes of 1,404 unreliable news websites, the large-language model MPNet, and DP-Means clustering, we introduce a system to automatically isolate and analyze the narratives spread within online ecosystems. Identifying 55,301 narratives on these 1,404 websites, we describe the most prevalent narratives spread in 2022 and identify the most influential websites that originate and magnify narratives. Finally, we show how our system can be utilized to detect new narratives originating from unreliable news websites and aid fact-checkers like Politifact, Reuters, and AP News in more quickly addressing misinformation stories.

Via

Access Paper or Ask Questions