Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ekaterina Kochmar

LLMs in Education: Novel Perspectives, Challenges, and Opportunities

Sep 18, 2024

Bashar Alhafni, Sowmya Vajjala, Stefano Bannò, Kaushal Kumar Maurya, Ekaterina Kochmar

Abstract:The role of large language models (LLMs) in education is an increasing area of interest today, considering the new opportunities they offer for teaching, learning, and assessment. This cutting-edge tutorial provides an overview of the educational applications of NLP and the impact that the recent advances in LLMs have had on this field. We will discuss the key challenges and opportunities presented by LLMs, grounding them in the context of four major educational applications: reading, writing, and speaking skills, and intelligent tutoring systems (ITS). This COLING 2025 tutorial is designed for researchers and practitioners interested in the educational applications of NLP and the role LLMs have to play in this area. It is the first of its kind to address this timely topic.

* COLING 2025 Tutorial

Via

Access Paper or Ask Questions

SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models

Aug 16, 2024

Kaushal Kumar Maurya, KV Aditya Srivatsa, Ekaterina Kochmar

Figure 1 for SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models

Figure 2 for SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models

Figure 3 for SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models

Figure 4 for SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models

Abstract:Large language models (LLMs) have gained increased popularity due to their remarkable success across various tasks, which has led to the active development of a large set of diverse LLMs. However, individual LLMs have limitations when applied to complex tasks because of such factors as training biases, model sizes, and the datasets used. A promising approach is to efficiently harness the diverse capabilities of LLMs to overcome these individual limitations. Towards this goal, we introduce a novel LLM selection algorithm called SelectLLM. This algorithm directs input queries to the most suitable subset of LLMs from a large pool, ensuring they collectively provide the correct response efficiently. SelectLLM uses a multi-label classifier, utilizing the classifier's predictions and confidence scores to design optimal policies for selecting an optimal, query-aware, and lightweight subset of LLMs. Our findings show that the proposed model outperforms individual LLMs and achieves competitive performance compared to similarly sized, computationally expensive top-performing LLM subsets. Specifically, with a similarly sized top-performing LLM subset, we achieve a significant reduction in latency on two standard reasoning benchmarks: 13% lower latency for GSM8K and 70% lower latency for MMLU. Additionally, we conduct comprehensive analyses and ablation studies, which validate the robustness of the proposed model.

Via

Access Paper or Ask Questions

Harnessing the Power of Multiple Minds: Lessons Learned from LLM Routing

May 01, 2024

KV Aditya Srivatsa, Kaushal Kumar Maurya, Ekaterina Kochmar

Figure 1 for Harnessing the Power of Multiple Minds: Lessons Learned from LLM Routing

Figure 2 for Harnessing the Power of Multiple Minds: Lessons Learned from LLM Routing

Figure 3 for Harnessing the Power of Multiple Minds: Lessons Learned from LLM Routing

Figure 4 for Harnessing the Power of Multiple Minds: Lessons Learned from LLM Routing

Abstract:With the rapid development of LLMs, it is natural to ask how to harness their capabilities efficiently. In this paper, we explore whether it is feasible to direct each input query to a single most suitable LLM. To this end, we propose LLM routing for challenging reasoning tasks. Our extensive experiments suggest that such routing shows promise but is not feasible in all scenarios, so more robust approaches should be investigated to fill this gap.

* Accepted to Workshop on Insights from Negative Results in NLP 2024 (co-located with NAACL 2024)

Via

Access Paper or Ask Questions

PetKaz at SemEval-2024 Task 8: Can Linguistics Capture the Specifics of LLM-generated Text?

Apr 08, 2024

Kseniia Petukhova, Roman Kazakov, Ekaterina Kochmar

Abstract:In this paper, we present our submission to the SemEval-2024 Task 8 "Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection", focusing on the detection of machine-generated texts (MGTs) in English. Specifically, our approach relies on combining embeddings from the RoBERTa-base with diversity features and uses a resampled training set. We score 12th from 124 in the ranking for Subtask A (monolingual track), and our results show that our approach is generalizable across unseen models and domains, achieving an accuracy of 0.91.

* 8 pages, 3 figures, 5 tables, to be published in the Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), for associated code, see https://github.com/sachertort/petkaz-semeval-m4

Via

Access Paper or Ask Questions

PetKaz at SemEval-2024 Task 3: Advancing Emotion Classification with an LLM for Emotion-Cause Pair Extraction in Conversations

Apr 08, 2024

Roman Kazakov, Kseniia Petukhova, Ekaterina Kochmar

Abstract:In this paper, we present our submission to the SemEval-2023 Task~3 "The Competition of Multimodal Emotion Cause Analysis in Conversations", focusing on extracting emotion-cause pairs from dialogs. Specifically, our approach relies on combining fine-tuned GPT-3.5 for emotion classification and a BiLSTM-based neural network to detect causes. We score 2nd in the ranking for Subtask 1, demonstrating the effectiveness of our approach through one of the highest weighted-average proportional F1 scores recorded at 0.264.

* 8 pages, 7 figures, 2 tables, to be published in the Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), for associated code, see https://github.com/sachertort/petkaz-semeval-ecac

Via

Access Paper or Ask Questions

What Makes Math Word Problems Challenging for LLMs?

Apr 01, 2024

KV Aditya Srivatsa, Ekaterina Kochmar

Abstract:This paper investigates the question of what makes math word problems (MWPs) in English challenging for large language models (LLMs). We conduct an in-depth analysis of the key linguistic and mathematical characteristics of MWPs. In addition, we train feature-based classifiers to better understand the impact of each feature on the overall difficulty of MWPs for prominent LLMs and investigate whether this helps predict how well LLMs fare against specific categories of MWPs.

* Accepted to NAACL Findings 2024

Via

Access Paper or Ask Questions

REFeREE: A REference-FREE Model-Based Metric for Text Simplification

Mar 26, 2024

Yichen Huang, Ekaterina Kochmar

Figure 1 for REFeREE: A REference-FREE Model-Based Metric for Text Simplification

Figure 2 for REFeREE: A REference-FREE Model-Based Metric for Text Simplification

Figure 3 for REFeREE: A REference-FREE Model-Based Metric for Text Simplification

Figure 4 for REFeREE: A REference-FREE Model-Based Metric for Text Simplification

Abstract:Text simplification lacks a universal standard of quality, and annotated reference simplifications are scarce and costly. We propose to alleviate such limitations by introducing REFeREE, a reference-free model-based metric with a 3-stage curriculum. REFeREE leverages an arbitrarily scalable pretraining stage and can be applied to any quality standard as long as a small number of human annotations are available. Our experiments show that our metric outperforms existing reference-based metrics in predicting overall ratings and reaches competitive and consistent performance in predicting specific ratings while requiring no reference simplifications at inference time.

* Accepted at LREC-COLING 2024

Via

Access Paper or Ask Questions

Are LLMs Good Cryptic Crossword Solvers?

Mar 15, 2024

Abdelrahman "Boda" Sadallah, Daria Kotova, Ekaterina Kochmar

Abstract:Cryptic crosswords are puzzles that rely not only on general knowledge but also on the solver's ability to manipulate language on different levels and deal with various types of wordplay. Previous research suggests that solving such puzzles is a challenge even for modern NLP models. However, the abilities of large language models (LLMs) have not yet been tested on this task. In this paper, we establish the benchmark results for three popular LLMs -- LLaMA2, Mistral, and ChatGPT -- showing that their performance on this task is still far from that of humans.

Via

Access Paper or Ask Questions

How Teachers Can Use Large Language Models and Bloom's Taxonomy to Create Educational Quizzes

Jan 11, 2024

Sabina Elkins, Ekaterina Kochmar, Jackie C. K. Cheung, Iulian Serban

Figure 1 for How Teachers Can Use Large Language Models and Bloom's Taxonomy to Create Educational Quizzes

Figure 2 for How Teachers Can Use Large Language Models and Bloom's Taxonomy to Create Educational Quizzes

Figure 3 for How Teachers Can Use Large Language Models and Bloom's Taxonomy to Create Educational Quizzes

Figure 4 for How Teachers Can Use Large Language Models and Bloom's Taxonomy to Create Educational Quizzes

Abstract:Question generation (QG) is a natural language processing task with an abundance of potential benefits and use cases in the educational domain. In order for this potential to be realized, QG systems must be designed and validated with pedagogical needs in mind. However, little research has assessed or designed QG approaches with the input from real teachers or students. This paper applies a large language model-based QG approach where questions are generated with learning goals derived from Bloom's taxonomy. The automatically generated questions are used in multiple experiments designed to assess how teachers use them in practice. The results demonstrate that teachers prefer to write quizzes with automatically generated questions, and that such quizzes have no loss in quality compared to handwritten versions. Further, several metrics indicate that automatically generated questions can even improve the quality of the quizzes created, showing the promise for large scale use of QG in the classroom setting.

* 8 pages, 8 figures. Accepted to the main track of the EAAI-24: The 14th Symposium on Educational Advances in Artificial Intelligence

Via

Access Paper or Ask Questions

BasahaCorpus: An Expanded Linguistic Resource for Readability Assessment in Central Philippine Languages

Oct 17, 2023

Joseph Marvin Imperial, Ekaterina Kochmar

Abstract:Current research on automatic readability assessment (ARA) has focused on improving the performance of models in high-resource languages such as English. In this work, we introduce and release BasahaCorpus as part of an initiative aimed at expanding available corpora and baseline models for readability assessment in lower resource languages in the Philippines. We compiled a corpus of short fictional narratives written in Hiligaynon, Minasbate, Karay-a, and Rinconada -- languages belonging to the Central Philippine family tree subgroup -- to train ARA models using surface-level, syllable-pattern, and n-gram overlap features. We also propose a new hierarchical cross-lingual modeling approach that takes advantage of a language's placement in the family tree to increase the amount of available training data. Our study yields encouraging results that support previous work showcasing the efficacy of cross-lingual models in low-resource settings, as well as similarities in highly informative linguistic features for mutually intelligible languages.

* Final camera-ready paper for EMNLP 2023 (Main)

Via

Access Paper or Ask Questions