Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nalin Kumar

Modular Monolingual Adaptation using Pretrained Language Models

Jun 04, 2026

Nalin Kumar, Ondřej Dušek

Abstract:Building monolingual language models (LMs) for low-resource languages typically relies on adapting pretrained language models (PLMs) by finetuning the whole model on the target language. This approach is widely favored over training from scratch, as it enables effective knowledge transfer. Additionally, prior work has shown that using a language-specific tokenizer can enhance the adaptability. In this work, we hypothesize that full model tuning is often unnecessary and propose a more modular approach. Specifically, we replace the tokens, freeze the corresponding embeddings, and tune the rest of the model. We use Scottish Gaelic, Irish, and Quechua for our experiments, with Quechua being a very low-resource language (8.5k training instances). Evaluation on natural language understanding (NLU) tasks -- mask filling, NER, and POS -- shows that our proposed approach improves performance when adapting models to low-resource languages. Additionally, we provide a comprehensive analysis of the effectiveness of training strategies, the choice of pretrained embeddings, and models.

* Accepted to ACL 2026 Industry Track

Via

Access Paper or Ask Questions

LEEETs-Dial: Linguistic Entrainment in End-to-End Task-oriented Dialogue systems

Nov 15, 2023

Nalin Kumar, Ondřej Dušek

Figure 1 for LEEETs-Dial: Linguistic Entrainment in End-to-End Task-oriented Dialogue systems

Figure 2 for LEEETs-Dial: Linguistic Entrainment in End-to-End Task-oriented Dialogue systems

Abstract:Linguistic entrainment, or alignment, represents a phenomenon where linguistic patterns employed by conversational participants converge to one another. While alignment has been shown to produce a more natural user experience, most dialogue systems do not have any provisions for it. In this work, we introduce methods for achieving dialogue alignment in a GPT-2-based end-to-end dialogue system through the utilization of shared vocabulary. We experiment with training instance weighting, alignment-specific loss, and additional conditioning to generate responses that align with the user. By comparing different entrainment techniques on the MultiWOZ dataset, we demonstrate that all three approaches produce significantly better-aligned results than the baseline, as confirmed by both automated and manual evaluation metrics.

Via

Access Paper or Ask Questions

Dict-NMT: Bilingual Dictionary based NMT for Extremely Low Resource Languages

Jun 09, 2022

Nalin Kumar, Deepak Kumar, Subhankar Mishra

Figure 1 for Dict-NMT: Bilingual Dictionary based NMT for Extremely Low Resource Languages

Figure 2 for Dict-NMT: Bilingual Dictionary based NMT for Extremely Low Resource Languages

Figure 3 for Dict-NMT: Bilingual Dictionary based NMT for Extremely Low Resource Languages

Figure 4 for Dict-NMT: Bilingual Dictionary based NMT for Extremely Low Resource Languages

Abstract:Neural Machine Translation (NMT) models have been effective on large bilingual datasets. However, the existing methods and techniques show that the model's performance is highly dependent on the number of examples in training data. For many languages, having such an amount of corpora is a far-fetched dream. Taking inspiration from monolingual speakers exploring new languages using bilingual dictionaries, we investigate the applicability of bilingual dictionaries for languages with extremely low, or no bilingual corpus. In this paper, we explore methods using bilingual dictionaries with an NMT model to improve translations for extremely low resource languages. We extend this work to multilingual systems, exhibiting zero-shot properties. We present a detailed analysis of the effects of the quality of dictionaries, training dataset size, language family, etc., on the translation quality. Results on multiple low-resource test languages show a clear advantage of our bilingual dictionary-based method over the baselines.

Via

Access Paper or Ask Questions

QUARC: Quaternion Multi-Modal Fusion Architecture For Hate Speech Classification

Dec 15, 2020

Deepak Kumar, Nalin Kumar, Subhankar Mishra

Figure 1 for QUARC: Quaternion Multi-Modal Fusion Architecture For Hate Speech Classification

Figure 2 for QUARC: Quaternion Multi-Modal Fusion Architecture For Hate Speech Classification

Figure 3 for QUARC: Quaternion Multi-Modal Fusion Architecture For Hate Speech Classification

Figure 4 for QUARC: Quaternion Multi-Modal Fusion Architecture For Hate Speech Classification

Abstract:Hate speech, quite common in the age of social media, at times harmless but can also cause mental trauma to someone or even riots in communities. Image of a religious symbol with derogatory comment or video of a man abusing a particular community, all become hate speech with its every modality (such as text, image, and audio) contributing towards it. Models based on a particular modality of hate speech post on social media are not useful, rather, we need models like multi-modal fusion models that consider both image and text while classifying hate speech. Text-image fusion models are heavily parameterized, hence we propose a quaternion neural network-based model having additional fusion components for each pair of modalities. The model is tested on the MMHS150K twitter dataset for hate speech classification. The model shows an almost 75% reduction in parameters and also benefits us in terms of storage space and training time while being at par in terms of performance as compared to its real counterpart.

* Accepted in Proc. of the 4th International Workshop on Dialog Systems (IWDS2021) in conjunction with the IEEE BigComp2021

Via

Access Paper or Ask Questions