Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Olivier Binette

Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework

Jun 14, 2024

Olivier Binette, Jerome P. Reiter

Figure 1 for Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework

Figure 2 for Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework

Figure 3 for Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework

Figure 4 for Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework

Abstract:Commonly, AI or machine learning (ML) models are evaluated on benchmark datasets. This practice supports innovative methodological research, but benchmark performance can be poorly correlated with performance in real-world applications -- a construct validity issue. To improve the validity and practical usefulness of evaluations, we propose using an estimands framework adapted from international clinical trials guidelines. This framework provides a systematic structure for inference and reporting in evaluations, emphasizing the importance of a well-defined estimation target. We illustrate our proposal on examples of commonly used evaluation methodologies - involving cross-validation, clustering evaluation, and LLM benchmarking - that can lead to incorrect rankings of competing models (rank reversals) with high probability, even when performance differences are large. We demonstrate how the estimands framework can help uncover underlying issues, their causes, and potential solutions. Ultimately, we believe this framework can improve the validity of evaluations through better-aligned inference, and help decision-makers and model users interpret reported results more effectively.

* 25 pages, 2 figures, 3 tables

Via

Access Paper or Ask Questions

How to Evaluate Entity Resolution Systems: An Entity-Centric Framework with Application to Inventor Name Disambiguation

Apr 08, 2024

Olivier Binette, Youngsoo Baek, Siddharth Engineer, Christina Jones, Abel Dasylva, Jerome P. Reiter

Figure 1 for How to Evaluate Entity Resolution Systems: An Entity-Centric Framework with Application to Inventor Name Disambiguation

Figure 2 for How to Evaluate Entity Resolution Systems: An Entity-Centric Framework with Application to Inventor Name Disambiguation

Figure 3 for How to Evaluate Entity Resolution Systems: An Entity-Centric Framework with Application to Inventor Name Disambiguation

Figure 4 for How to Evaluate Entity Resolution Systems: An Entity-Centric Framework with Application to Inventor Name Disambiguation

Abstract:Entity resolution (record linkage, microclustering) systems are notoriously difficult to evaluate. Looking for a needle in a haystack, traditional evaluation methods use sophisticated, application-specific sampling schemes to find matching pairs of records among an immense number of non-matches. We propose an alternative that facilitates the creation of representative, reusable benchmark data sets without necessitating complex sampling schemes. These benchmark data sets can then be used for model training and a variety of evaluation tasks. Specifically, we propose an entity-centric data labeling methodology that integrates with a unified framework for monitoring summary statistics, estimating key performance metrics such as cluster and pairwise precision and recall, and analyzing root causes for errors. We validate the framework in an application to inventor name disambiguation and through simulation studies. Software: https://github.com/OlivierBinette/er-evaluation/

* 33 pages, 11 figures

Via

Access Paper or Ask Questions

PatentsView-Evaluation: Evaluation Datasets and Tools to Advance Research on Inventor Name Disambiguation

Jan 09, 2023

Olivier Binette, Sarvo Madhavan, Jack Butler, Beth Anne Card, Emily Melluso, Christina Jones

Figure 1 for PatentsView-Evaluation: Evaluation Datasets and Tools to Advance Research on Inventor Name Disambiguation

Figure 2 for PatentsView-Evaluation: Evaluation Datasets and Tools to Advance Research on Inventor Name Disambiguation

Abstract:We present PatentsView-Evaluation, a Python package that enables researchers to evaluate the performance of inventor name disambiguation systems such as PatentsView.org. The package includes benchmark datasets and evaluation tools, and aims to advance research on inventor name disambiguation by providing access to high-quality evaluation data and improving evaluation standards.

* 3 pages, 2 figures

Via

Access Paper or Ask Questions

Estimating the Performance of Entity Resolution Algorithms: Lessons Learned Through PatentsView.org

Oct 03, 2022

Olivier Binette, Sokhna A York, Emma Hickerson, Youngsoo Baek, Sarvo Madhavan, Christina Jones

Figure 1 for Estimating the Performance of Entity Resolution Algorithms: Lessons Learned Through PatentsView.org

Figure 2 for Estimating the Performance of Entity Resolution Algorithms: Lessons Learned Through PatentsView.org

Figure 3 for Estimating the Performance of Entity Resolution Algorithms: Lessons Learned Through PatentsView.org

Figure 4 for Estimating the Performance of Entity Resolution Algorithms: Lessons Learned Through PatentsView.org

Abstract:This paper introduces a novel evaluation methodology for entity resolution algorithms. It is motivated by PatentsView.org, a U.S. Patents and Trademarks Office patent data exploration tool that disambiguates patent inventors using an entity resolution algorithm. We provide a data collection methodology and tailored performance estimators that account for sampling biases. Our approach is simple, practical and principled -- key characteristics that allow us to paint the first representative picture of PatentsView's disambiguation performance. This approach is used to inform PatentsView's users of the reliability of the data and to allow the comparison of competing disambiguation algorithms.

* 19 pages, 4 figures

Via

Access Paper or Ask Questions

(Almost) All of Entity Resolution

Aug 10, 2020

Olivier Binette, Rebecca C. Steorts

Figure 1 for (Almost) All of Entity Resolution

Figure 2 for (Almost) All of Entity Resolution

Figure 3 for (Almost) All of Entity Resolution

Abstract:Whether the goal is to estimate the number of people that live in a congressional district, to estimate the number of individuals that have died in an armed conflict, or to disambiguate individual authors using bibliographic data, all these applications have a common theme - integrating information from multiple sources. Before such questions can be answered, databases must be cleaned and integrated in a systematic and accurate way, commonly known as record linkage, de-duplication, or entity resolution. In this article, we review motivational applications and seminal papers that have led to the growth of this area. Specifically, we review the foundational work that began in the 1940's and 50's that have led to modern probabilistic record linkage. We review clustering approaches to entity resolution, semi- and fully supervised methods, and canonicalization, which are being used throughout industry and academia in applications such as human rights, official statistics, medicine, citation networks, among others. Finally, we discuss current research topics of practical importance.

* 53 pages, includes supplementary materials

Via

Access Paper or Ask Questions