Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daria Galimzianova

OpenAutoNLU: Open Source AutoML Library for NLU

Mar 02, 2026

Grigory Arshinov, Aleksandr Boriskin, Sergey Senichev, Ayaz Zaripov, Daria Galimzianova, Daniil Karpov, Leonid Sanochkin

Abstract:OpenAutoNLU is an open-source automated machine learning library for natural language understanding (NLU) tasks, covering both text classification and named entity recognition (NER). Unlike existing solutions, we introduce data-aware training regime selection that requires no manual configuration from the user. The library also provides integrated data quality diagnostics, configurable out-of-distribution (OOD) detection, and large language model (LLM) features, all within a minimal lowcode API. The demo app is accessible here https://openautonlu.dev.

Via

Access Paper or Ask Questions

Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA

May 27, 2025

Sergey Pletenev, Maria Marina, Nikolay Ivanov, Daria Galimzianova, Nikita Krayko, Mikhail Salnikov, Vasily Konovalov, Alexander Panchenko, Viktor Moskvoretskii

Figure 1 for Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA

Figure 2 for Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA

Figure 3 for Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA

Figure 4 for Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA

Abstract:Large Language Models (LLMs) often hallucinate in question answering (QA) tasks. A key yet underexplored factor contributing to this is the temporality of questions -- whether they are evergreen (answers remain stable over time) or mutable (answers change). In this work, we introduce EverGreenQA, the first multilingual QA dataset with evergreen labels, supporting both evaluation and training. Using EverGreenQA, we benchmark 12 modern LLMs to assess whether they encode question temporality explicitly (via verbalized judgments) or implicitly (via uncertainty signals). We also train EG-E5, a lightweight multilingual classifier that achieves SoTA performance on this task. Finally, we demonstrate the practical utility of evergreen classification across three applications: improving self-knowledge estimation, filtering QA datasets, and explaining GPT-4o retrieval behavior.

Via

Access Paper or Ask Questions

LLM-Independent Adaptive RAG: Let the Question Speak for Itself

May 07, 2025

Maria Marina, Nikolay Ivanov, Sergey Pletenev, Mikhail Salnikov, Daria Galimzianova, Nikita Krayko, Vasily Konovalov, Alexander Panchenko, Viktor Moskvoretskii

Figure 1 for LLM-Independent Adaptive RAG: Let the Question Speak for Itself

Figure 2 for LLM-Independent Adaptive RAG: Let the Question Speak for Itself

Figure 3 for LLM-Independent Adaptive RAG: Let the Question Speak for Itself

Figure 4 for LLM-Independent Adaptive RAG: Let the Question Speak for Itself

Abstract:Large Language Models~(LLMs) are prone to hallucinations, and Retrieval-Augmented Generation (RAG) helps mitigate this, but at a high computational cost while risking misinformation. Adaptive retrieval aims to retrieve only when necessary, but existing approaches rely on LLM-based uncertainty estimation, which remain inefficient and impractical. In this study, we introduce lightweight LLM-independent adaptive retrieval methods based on external information. We investigated 27 features, organized into 7 groups, and their hybrid combinations. We evaluated these methods on 6 QA datasets, assessing the QA performance and efficiency. The results show that our approach matches the performance of complex LLM-based methods while achieving significant efficiency gains, demonstrating the potential of external information for adaptive retrieval.

* 11 pages, 5 figures, 2 tables

Via

Access Paper or Ask Questions

CleanComedy: Creating Friendly Humor through Generative Techniques

Dec 12, 2024

Dmitry Vikhorev, Daria Galimzianova, Svetlana Gorovaia, Elizaveta Zhemchuzhina, Ivan P. Yamshchikov

Abstract:Humor generation is a challenging task in natural language processing due to limited resources and the quality of existing datasets. Available humor language resources often suffer from toxicity and duplication, limiting their effectiveness for training robust models. This paper proposes CleanComedy, a specialized, partially annotated toxicity-filtered corpus of English and Russian jokes collected from various sources. We study the effectiveness of our data filtering approach through a survey on humor and toxicity levels in various joke groups. In addition, we study advances in computer humor generation by comparing jokes written by humans with various groups of generative jokes, including our baseline models trained on the CleanComedy datasets.

Via

Access Paper or Ask Questions