Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Leslie Barrett

Improving ML Training Data with Gold-Standard Quality Metrics

Dec 23, 2025

Leslie Barrett, Michael W. Sherman

Abstract:Hand-tagged training data is essential to many machine learning tasks. However, training data quality control has received little attention in the literature, despite data quality varying considerably with the tagging exercise. We propose methods to evaluate and enhance the quality of hand-tagged training data using statistical approaches to measure tagging consistency and agreement. We show that agreement metrics give more reliable results if recorded over multiple iterations of tagging, where declining variance in such recordings is an indicator of increasing data quality. We also show one way a tagging project can collect high-quality training data without requiring multiple tags for every work item, and that a tagger burn-in period may not be sufficient for minimizing tagger errors.

* In KDD '19: 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 05, 2019, Anchorage, AK

Via

Access Paper or Ask Questions

Prompting the Market? A Large-Scale Meta-Analysis of GenAI in Finance NLP (2022-2025)

Sep 11, 2025

Paolo Pedinotti, Peter Baumann, Nathan Jessurun, Leslie Barrett, Enrico Santus

Figure 1 for Prompting the Market? A Large-Scale Meta-Analysis of GenAI in Finance NLP (2022-2025)

Figure 2 for Prompting the Market? A Large-Scale Meta-Analysis of GenAI in Finance NLP (2022-2025)

Figure 3 for Prompting the Market? A Large-Scale Meta-Analysis of GenAI in Finance NLP (2022-2025)

Figure 4 for Prompting the Market? A Large-Scale Meta-Analysis of GenAI in Finance NLP (2022-2025)

Abstract:Large Language Models (LLMs) have rapidly reshaped financial NLP, enabling new tasks and driving a proliferation of datasets and diversification of data sources. Yet, this transformation has outpaced traditional surveys. In this paper, we present MetaGraph, a generalizable methodology for extracting knowledge graphs from scientific literature and analyzing them to obtain a structured, queryable view of research trends. We define an ontology for financial NLP research and apply an LLM-based extraction pipeline to 681 papers (2022-2025), enabling large-scale, data-driven analysis. MetaGraph reveals three key phases: early LLM adoption and task/dataset innovation; critical reflection on LLM limitations; and growing integration of peripheral techniques into modular systems. This structured view offers both practitioners and researchers a clear understanding of how financial NLP has evolved - highlighting emerging trends, shifting priorities, and methodological shifts-while also demonstrating a reusable approach for mapping scientific progress in other domains.

* 7 pages, 6 appendices, EMNLP industry track

Via

Access Paper or Ask Questions

LexTime: A Benchmark for Temporal Ordering of Legal Events

Jun 04, 2025

Claire Barale, Leslie Barrett, Vikram Sunil Bajaj, Michael Rovatsos

Abstract:Temporal reasoning in legal texts is important for applications like case law analysis and compliance monitoring. However, existing datasets lack expert language evaluation, leaving a gap in understanding how LLMs manage event ordering in legal contexts. We introduce LexTime, the first dataset designed to evaluate LLMs' event ordering capabilities in legal language, consisting of 512 instances from U.S. Federal Complaints with annotated event pairs and their temporal relations. Our findings show that (1) LLMs are more accurate on legal event ordering than on narrative (up to +10.5%); (2) longer input contexts and implicit events boost accuracy, reaching 80.8% for implicit-explicit event pairs; (3) legal linguistic complexities and nested clauses remain a challenge. We investigate how context length, explicit vs implicit event pairs, and legal language features affect model performance, demonstrating the need for specific modeling strategies to enhance temporal event reasoning.

* Preprint

Via

Access Paper or Ask Questions

Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)

Jul 20, 2024

Apurv Verma, Satyapriya Krishna, Sebastian Gehrmann, Madhavan Seshadri, Anu Pradhan, Tom Ault, Leslie Barrett, David Rabinowitz, John Doucette, NhatHai Phan

Figure 1 for Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)

Figure 2 for Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)

Figure 3 for Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)

Figure 4 for Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)

Abstract:Creating secure and resilient applications with large language models (LLM) requires anticipating, adjusting to, and countering unforeseen threats. Red-teaming has emerged as a critical technique for identifying vulnerabilities in real-world LLM implementations. This paper presents a detailed threat model and provides a systematization of knowledge (SoK) of red-teaming attacks on LLMs. We develop a taxonomy of attacks based on the stages of the LLM development and deployment process and extract various insights from previous research. In addition, we compile methods for defense and practical red-teaming strategies for practitioners. By delineating prominent attack motifs and shedding light on various entry points, this paper provides a framework for improving the security and robustness of LLM-based systems.

* Preprint. Under review

Via

Access Paper or Ask Questions