Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Karunesh Arora

Optimal Labeler Assignment and Sampling for Active Learning in the Presence of Imperfect Labels

Dec 14, 2025

Pouya Ahadi, Blair Winograd, Camille Zaug, Karunesh Arora, Lijun Wang, Kamran Paynabar

Abstract:Active Learning (AL) has garnered significant interest across various application domains where labeling training data is costly. AL provides a framework that helps practitioners query informative samples for annotation by oracles (labelers). However, these labels often contain noise due to varying levels of labeler accuracy. Additionally, uncertain samples are more prone to receiving incorrect labels because of their complexity. Learning from imperfectly labeled data leads to an inaccurate classifier. We propose a novel AL framework to construct a robust classification model by minimizing noise levels. Our approach includes an assignment model that optimally assigns query points to labelers, aiming to minimize the maximum possible noise within each cycle. Additionally, we introduce a new sampling method to identify the best query points, reducing the impact of label noise on classifier performance. Our experiments demonstrate that our approach significantly improves classification performance compared to several benchmark methods.

* 22 pages, 6 figures. Preprint under review

Via

Access Paper or Ask Questions

Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages

May 08, 2024

Sankalp Bahad, Pruthwik Mishra, Karunesh Arora, Rakesh Chandra Balabantaray, Dipti Misra Sharma, Parameswari Krishnamurthy

Figure 1 for Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages

Figure 2 for Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages

Figure 3 for Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages

Figure 4 for Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages

Abstract:Named Entity Recognition (NER) is a useful component in Natural Language Processing (NLP) applications. It is used in various tasks such as Machine Translation, Summarization, Information Retrieval, and Question-Answering systems. The research on NER is centered around English and some other major languages, whereas limited attention has been given to Indian languages. We analyze the challenges and propose techniques that can be tailored for Multilingual Named Entity Recognition for Indian Languages. We present a human annotated named entity corpora of 40K sentences for 4 Indian languages from two of the major Indian language families. Additionally,we present a multilingual model fine-tuned on our dataset, which achieves an F1 score of 0.80 on our dataset on average. We achieve comparable performance on completely unseen benchmark datasets for Indian languages which affirms the usability of our model.

* 8 pages, accepted in NAACL-SRW, 2024

Via

Access Paper or Ask Questions