Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Abdulfattah Safa

Are Non-English Papers Reviewed Fairly? Language-of-Study Bias in NLP Peer Reviews

Apr 08, 2026

Ehsan Barkhordar, Abdulfattah Safa, Verena Blaschke, Erika Lombart, Marie-Catherine de Marneffe, Gözde Gül Şahin

Abstract:Peer review plays a central role in the NLP publication process, but is susceptible to various biases. Here, we study language-of-study (LoS) bias: the tendency for reviewers to evaluate a paper differently based on the language(s) it studies, rather than its scientific merit. Despite being explicitly flagged in reviewing guidelines, such biases are poorly understood. Prior work treats such comments as part of broader categories of weak or unconstructive reviews without defining them as a distinct form of bias. We present the first systematic characterization of LoS bias, distinguishing negative and positive forms, and introduce the human-annotated dataset LOBSTER (Language-Of-study Bias in ScienTific pEer Review) and a method achieving 87.37 macro F1 for detection. We analyze 15,645 reviews to estimate how negative and positive biases differ with respect to the LoS, and find that non-English papers face substantially higher bias rates than English-only ones, with negative bias consistently outweighing positive bias. Finally, we identify four subcategories of negative bias, and find that demanding unjustified cross-lingual generalization is the most dominant form. We publicly release all resources to support work on fairer reviewing practices in NLP and beyond.

* 21 pages, 10 figures, 9 tables

Via

Access Paper or Ask Questions

A Systematic Survey on Instructional Text: From Representation and Downstream NLP Tasks

Oct 24, 2024

Abdulfattah Safa, Tamta Kapanadze, Arda Uzunoğlu, Gözde Gül Şahin

Figure 1 for A Systematic Survey on Instructional Text: From Representation and Downstream NLP Tasks

Figure 2 for A Systematic Survey on Instructional Text: From Representation and Downstream NLP Tasks

Figure 3 for A Systematic Survey on Instructional Text: From Representation and Downstream NLP Tasks

Figure 4 for A Systematic Survey on Instructional Text: From Representation and Downstream NLP Tasks

Abstract:Recent advances in large language models have demonstrated promising capabilities in following simple instructions through instruction tuning. However, real-world tasks often involve complex, multi-step instructions that remain challenging for current NLP systems. Despite growing interest in this area, there lacks a comprehensive survey that systematically analyzes the landscape of complex instruction understanding and processing. Through a systematic review of the literature, we analyze available resources, representation schemes, and downstream tasks related to instructional text. Our study examines 177 papers, identifying trends, challenges, and opportunities in this emerging field. We provide AI/NLP researchers with essential background knowledge and a unified view of various approaches to complex instruction understanding, bridging gaps between different research directions and highlighting future research opportunities.

Via

Access Paper or Ask Questions

A Zero-Shot Open-Vocabulary Pipeline for Dialogue Understanding

Sep 24, 2024

Abdulfattah Safa, Gözde Gül Şahin

Figure 1 for A Zero-Shot Open-Vocabulary Pipeline for Dialogue Understanding

Figure 2 for A Zero-Shot Open-Vocabulary Pipeline for Dialogue Understanding

Figure 3 for A Zero-Shot Open-Vocabulary Pipeline for Dialogue Understanding

Figure 4 for A Zero-Shot Open-Vocabulary Pipeline for Dialogue Understanding

Abstract:Dialogue State Tracking (DST) is crucial for understanding user needs and executing appropriate system actions in task-oriented dialogues. Majority of existing DST methods are designed to work within predefined ontologies and assume the availability of gold domain labels, struggling with adapting to new slots values. While Large Language Models (LLMs)-based systems show promising zero-shot DST performance, they either require extensive computational resources or they underperform existing fully-trained systems, limiting their practicality. To address these limitations, we propose a zero-shot, open-vocabulary system that integrates domain classification and DST in a single pipeline. Our approach includes reformulating DST as a question-answering task for less capable models and employing self-refining prompts for more adaptable ones. Our system does not rely on fixed slot values defined in the ontology allowing the system to adapt dynamically. We compare our approach with existing SOTA, and show that it provides up to 20% better Joint Goal Accuracy (JGA) over previous methods on datasets like Multi-WOZ 2.1, with up to 90% fewer requests to the LLM API.

Via

Access Paper or Ask Questions