Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Iryna Gurevych

LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

Aug 08, 2024

Mervat Abassy, Kareem Elozeiri, Alexander Aziz, Minh Ngoc Ta, Raj Vardhan Tomar, Bimarsha Adhikari, Saad El Dine Ahmed, Yuxia Wang, Osama Mohammed Afzal, Zhuohan Xie(+14 more)

Figure 1 for LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

Figure 2 for LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

Figure 3 for LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

Figure 4 for LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

Abstract:The widespread accessibility of large language models (LLMs) to the general public has significantly amplified the dissemination of machine-generated texts (MGTs). Advancements in prompt manipulation have exacerbated the difficulty in discerning the origin of a text (human-authored vs machinegenerated). This raises concerns regarding the potential misuse of MGTs, particularly within educational and academic domains. In this paper, we present $\textbf{LLM-DetectAIve}$ -- a system designed for fine-grained MGT detection. It is able to classify texts into four categories: human-written, machine-generated, machine-written machine-humanized, and human-written machine-polished. Contrary to previous MGT detectors that perform binary classification, introducing two additional categories in LLM-DetectiAIve offers insights into the varying degrees of LLM intervention during the text creation. This might be useful in some domains like education, where any LLM intervention is usually prohibited. Experiments show that LLM-DetectAIve can effectively identify the authorship of textual content, proving its usefulness in enhancing integrity in education, academia, and other domains. LLM-DetectAIve is publicly accessible at https://huggingface.co/spaces/raj-tomar001/MGT-New. The video describing our system is available at https://youtu.be/E8eT_bE7k8c.

Via

Access Paper or Ask Questions

A Course Shared Task on Evaluating LLM Output for Clinical Questions

Jul 31, 2024

Yufang Hou, Thy Thy Tran, Doan Nam Long Vu, Yiwen Cao, Kai Li, Lukas Rohde, Iryna Gurevych

Figure 1 for A Course Shared Task on Evaluating LLM Output for Clinical Questions

Figure 2 for A Course Shared Task on Evaluating LLM Output for Clinical Questions

Figure 3 for A Course Shared Task on Evaluating LLM Output for Clinical Questions

Abstract:This paper presents a shared task that we organized at the Foundations of Language Technology (FoLT) course in 2023/2024 at the Technical University of Darmstadt, which focuses on evaluating the output of Large Language Models (LLMs) in generating harmful answers to health-related clinical questions. We describe the task design considerations and report the feedback we received from the students. We expect the task and the findings reported in this paper to be relevant for instructors teaching natural language processing (NLP) and designing course assignments.

* accepted at the sixth Workshop on Teaching NLP (co-located with ACL 2024)

Via

Access Paper or Ask Questions

Overview of PerpectiveArg2024: The First Shared Task on Perspective Argument Retrieval

Jul 29, 2024

Neele Falk, Andreas Waldis, Iryna Gurevych

Figure 1 for Overview of PerpectiveArg2024: The First Shared Task on Perspective Argument Retrieval

Figure 2 for Overview of PerpectiveArg2024: The First Shared Task on Perspective Argument Retrieval

Figure 3 for Overview of PerpectiveArg2024: The First Shared Task on Perspective Argument Retrieval

Figure 4 for Overview of PerpectiveArg2024: The First Shared Task on Perspective Argument Retrieval

Abstract:Argument retrieval is the task of finding relevant arguments for a given query. While existing approaches rely solely on the semantic alignment of queries and arguments, this first shared task on perspective argument retrieval incorporates perspectives during retrieval, accounting for latent influences in argumentation. We present a novel multilingual dataset covering demographic and socio-cultural (socio) variables, such as age, gender, and political attitude, representing minority and majority groups in society. We distinguish between three scenarios to explore how retrieval systems consider explicitly (in both query and corpus) and implicitly (only in query) formulated perspectives. This paper provides an overview of this shared task and summarizes the results of the six submitted systems. We find substantial challenges in incorporating perspectivism, especially when aiming for personalization based solely on the text of arguments without explicitly providing socio profiles. Moreover, retrieval systems tend to be biased towards the majority group but partially mitigate bias for the female gender. While we bootstrap perspective argument retrieval, further research is essential to optimize retrieval systems to facilitate personalization and reduce polarization.

Via

Access Paper or Ask Questions

Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment

Jul 20, 2024

Yongxin Huang, Kexin Wang, Goran Glavaš, Iryna Gurevych

Figure 1 for Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment

Figure 2 for Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment

Figure 3 for Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment

Figure 4 for Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment

Abstract:Multilingual sentence encoders are commonly obtained by training multilingual language models to map sentences from different languages into a shared semantic space. As such, they are subject to curse of multilinguality, a loss of monolingual representational accuracy due to parameter sharing. Another limitation of multilingual sentence encoders is the trade-off between monolingual and cross-lingual performance. Training for cross-lingual alignment of sentence embeddings distorts the optimal monolingual structure of semantic spaces of individual languages, harming the utility of sentence embeddings in monolingual tasks. In this work, we address both issues by modular training of sentence encoders, i.e., by separating monolingual specialization from cross-lingual alignment. We first efficiently train language-specific sentence encoders to avoid negative interference between languages (i.e., the curse). We then align all non-English monolingual encoders to the English encoder by training a cross-lingual alignment adapter on top of each, preventing interference with monolingual specialization from the first step. In both steps, we resort to contrastive learning on machine-translated paraphrase data. Monolingual and cross-lingual evaluations on semantic text similarity/relatedness and multiple-choice QA render our modular solution more effective than multilingual sentence encoders, especially benefiting low-resource languages.

Via

Access Paper or Ask Questions

$\textit{GeoHard}$: Towards Measuring Class-wise Hardness through Modelling Class Semantics

Jul 17, 2024

Fengyu Cai, Xinran Zhao, Hongming Zhang, Iryna Gurevych, Heinz Koeppl

$Figure 1 for $\textit{GeoHard}$: Towards Measuring Class-wise Hardness through Modelling Class Semantics$

$Figure 2 for $\textit{GeoHard}$: Towards Measuring Class-wise Hardness through Modelling Class Semantics$

$Figure 3 for $\textit{GeoHard}$: Towards Measuring Class-wise Hardness through Modelling Class Semantics$

$Figure 4 for $\textit{GeoHard}$: Towards Measuring Class-wise Hardness through Modelling Class Semantics$

Abstract:Recent advances in measuring hardness-wise properties of data guide language models in sample selection within low-resource scenarios. However, class-specific properties are overlooked for task setup and learning. How will these properties influence model learning and is it generalizable across datasets? To answer this question, this work formally initiates the concept of $\textit{class-wise hardness}$. Experiments across eight natural language understanding (NLU) datasets demonstrate a consistent hardness distribution across learning paradigms, models, and human judgment. Subsequent experiments unveil a notable challenge in measuring such class-wise hardness with instance-level metrics in previous works. To address this, we propose $\textit{GeoHard}$ for class-wise hardness measurement by modeling class geometry in the semantic embedding space. $\textit{GeoHard}$ surpasses instance-level metrics by over 59 percent on $\textit{Pearson}$'s correlation on measuring class-wise hardness. Our analysis theoretically and empirically underscores the generality of $\textit{GeoHard}$ as a fresh perspective on data diagnosis. Additionally, we showcase how understanding class-wise hardness can practically aid in improving task learning.

* Findings of ACL 2024

Via

Access Paper or Ask Questions

InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback

Jul 16, 2024

Haishuo Fang, Xiaodan Zhu, Iryna Gurevych

Figure 1 for InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback

Figure 2 for InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback

Figure 3 for InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback

Figure 4 for InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback

Abstract:A crucial requirement for deploying LLM-based agents in real-life applications is robustness against risky or irreversible mistakes. However, existing research lacks a focus on the preemptive evaluation of reasoning trajectories performed by LLM agents, leading to a gap in ensuring safe and reliable operations. To explore better solutions, this paper introduces InferAct, a novel approach that leverages the Theory-of-Mind capability of LLMs to proactively detect potential errors before critical actions are executed (e.g., "buy-now" in automatic online trading or web shopping). InferAct is also capable of integrating human feedback to prevent irreversible risks and enhance the actor agent's decision-making process. Experiments on three widely used tasks demonstrate the effectiveness of InferAct. The proposed solution presents a novel approach and concrete contributions toward developing LLM agents that can be safely deployed in different environments involving critical decision-making.

Via

Access Paper or Ask Questions

Robust Utility-Preserving Text Anonymization Based on Large Language Models

Jul 16, 2024

Tianyu Yang, Xiaodan Zhu, Iryna Gurevych

Abstract:Text anonymization is crucial for sharing sensitive data while maintaining privacy. Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models (LLMs), which have shown advanced capability in memorizing detailed information and patterns as well as connecting disparate pieces of information. In defending against LLM-based re-identification attacks, anonymization could jeopardize the utility of the resulting anonymized data in downstream tasks -- the trade-off between privacy and data utility requires deeper understanding within the context of LLMs. This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component, which work collaboratively to perform anonymization. To provide a practical model for large-scale and real-time environments, we distill the anonymization capabilities into a lightweight model using Direct Preference Optimization (DPO). Extensive experiments demonstrate that the proposed models outperform baseline models, showing robustness in reducing the risk of re-identification while preserving greater data utility in downstream tasks. Our code and dataset are available at https://github.com/UKPLab/arxiv2024-rupta.

Via

Access Paper or Ask Questions

$\texttt{MixGR}$: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity

Jul 15, 2024

Fengyu Cai, Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Iryna Gurevych, Heinz Koeppl

$Figure 1 for $\texttt{MixGR}$: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity$

$Figure 2 for $\texttt{MixGR}$: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity$

$Figure 3 for $\texttt{MixGR}$: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity$

$Figure 4 for $\texttt{MixGR}$: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity$

Abstract:Recent studies show the growing significance of document retrieval in the generation of LLMs, i.e., RAG, within the scientific domain by bridging their knowledge gap. However, dense retrievers often struggle with domain-specific retrieval and complex query-document relationships, particularly when query segments correspond to various parts of a document. To alleviate such prevalent challenges, this paper introduces $\texttt{MixGR}$, which improves dense retrievers' awareness of query-document matching across various levels of granularity in queries and documents using a zero-shot approach. $\texttt{MixGR}$ fuses various metrics based on these granularities to a united score that reflects a comprehensive query-document similarity. Our experiments demonstrate that $\texttt{MixGR}$ outperforms previous document retrieval by 24.7% and 9.8% on nDCG@5 with unsupervised and supervised retrievers, respectively, averaged on queries containing multiple subqueries from five scientific retrieval datasets. Moreover, the efficacy of two downstream scientific question-answering tasks highlights the advantage of $\texttt{MixGR}$to boost the application of LLMs in the scientific domain.

Via

Access Paper or Ask Questions

Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors

Jul 12, 2024

Nico Daheim, Jakub Macina, Manu Kapur, Iryna Gurevych, Mrinmaya Sachan

Figure 1 for Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors

Figure 2 for Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors

Figure 3 for Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors

Figure 4 for Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors

Abstract:Large language models (LLMs) present an opportunity to scale high-quality personalized education to all. A promising approach towards this means is to build dialog tutoring models that scaffold students' problem-solving. However, even though existing LLMs perform well in solving reasoning questions, they struggle to precisely detect student's errors and tailor their feedback to these errors. Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we focus on verifying student solutions and show how grounding to such verification improves the overall quality of tutor response generation. We collect a dataset of 1K stepwise math reasoning chains with the first error step annotated by teachers. We show empirically that finding the mistake in a student solution is challenging for current models. We propose and evaluate several verifiers for detecting these errors. Using both automatic and human evaluation we show that the student solution verifiers steer the generation model towards highly targeted responses to student errors which are more often correct with less hallucinations compared to existing baselines.

* Preprint. Nico Daheim and Jakub Macina contributed equally. Code and dataset can be found under: https://github.com/eth-lre/verify-then-generate

Via

Access Paper or Ask Questions

HDT: Hierarchical Document Transformer

Jul 11, 2024

Haoyu He, Markus Flicke, Jan Buchmann, Iryna Gurevych, Andreas Geiger

Figure 1 for HDT: Hierarchical Document Transformer

Figure 2 for HDT: Hierarchical Document Transformer

Figure 3 for HDT: Hierarchical Document Transformer

Figure 4 for HDT: Hierarchical Document Transformer

Abstract:In this paper, we propose the Hierarchical Document Transformer (HDT), a novel sparse Transformer architecture tailored for structured hierarchical documents. Such documents are extremely important in numerous domains, including science, law or medicine. However, most existing solutions are inefficient and fail to make use of the structure inherent to documents. HDT exploits document structure by introducing auxiliary anchor tokens and redesigning the attention mechanism into a sparse multi-level hierarchy. This approach facilitates information exchange between tokens at different levels while maintaining sparsity, thereby enhancing computational and memory efficiency while exploiting the document structure as an inductive bias. We address the technical challenge of implementing HDT's sample-dependent hierarchical attention pattern by developing a novel sparse attention kernel that considers the hierarchical structure of documents. As demonstrated by our experiments, utilizing structural information present in documents leads to faster convergence, higher sample efficiency and better performance on downstream tasks.

Via

Access Paper or Ask Questions