Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Neh Majmudar

From Rosetta to Match-Up: A Paired Corpus of Linguistic Puzzles with Human and LLM Benchmarks

May 13, 2026

Neh Majmudar, Anne Huang, Jinfan Frank Hu, Elena Filatova

Abstract:In this paper, we examine linguistic puzzles used in high school linguistics competitions, focusing on two common formats: Rosetta Stone and Match-Up. We propose a systematic procedure for converting existing Rosetta Stone puzzles into corresponding Match-Up counterparts. Because linguistic puzzle creation is complex and time-consuming, our method provides an efficient way to accelerate the generation of new puzzles. We evaluate the resulting Rosetta Stone-Match-Up pairs with both human participants and large language models (LLMs). Our results show that both expert human solvers and LLMs display an all-or-nothing pattern on Match-Up puzzles, either solving them completely or failing entirely. This work contributes a new dataset of paired puzzles and provides a detailed evaluation of puzzle difficulty across formats, offering insights into both human and machine linguistic reasoning.

* Proceedings of the Fifteenth Language Resources and Evaluation Conference

Via

Access Paper or Ask Questions

Handling Uncertainty in Health Data using Generative Algorithms

Mar 05, 2025

Mahdi Arab Loodaricheh, Neh Majmudar, Anita Raja, Ansaf Salleb-Aouissi

Figure 1 for Handling Uncertainty in Health Data using Generative Algorithms

Figure 2 for Handling Uncertainty in Health Data using Generative Algorithms

Figure 3 for Handling Uncertainty in Health Data using Generative Algorithms

Figure 4 for Handling Uncertainty in Health Data using Generative Algorithms

Abstract:Understanding and managing uncertainty is crucial in machine learning, especially in high-stakes domains like healthcare, where class imbalance can impact predictions. This paper introduces RIGA, a novel pipeline that mitigates class imbalance using generative AI. By converting tabular healthcare data into images, RIGA leverages models like cGAN, VQVAE, and VQGAN to generate balanced samples, improving classification performance. These representations are processed by CNNs and later transformed back into tabular format for seamless integration. This approach enhances traditional classifiers like XGBoost, improves Bayesian structure learning, and strengthens ML model robustness by generating realistic synthetic data for underrepresented classes.

Via

Access Paper or Ask Questions