Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DataPerf: Benchmarks for Data-Centric AI Development

Jul 20, 2022

Mark Mazumder, Colby Banbury, Xiaozhe Yao, Bojan Karlaš, William Gaviria Rojas, Sudnya Diamos, Greg Diamos, Lynn He, Douwe Kiela, David Jurado(+26 more)

Figure 1 for DataPerf: Benchmarks for Data-Centric AI Development

Figure 2 for DataPerf: Benchmarks for Data-Centric AI Development

Figure 3 for DataPerf: Benchmarks for Data-Centric AI Development

Figure 4 for DataPerf: Benchmarks for Data-Centric AI Development

Share this with someone who'll enjoy it:

Abstract:Machine learning (ML) research has generally focused on models, while the most prominent datasets have been employed for everyday ML tasks without regard for the breadth, difficulty, and faithfulness of these datasets to the underlying problem. Neglecting the fundamental importance of datasets has caused major problems involving data cascades in real-world applications and saturation of dataset-driven criteria for model quality, hindering research growth. To solve this problem, we present DataPerf, a benchmark package for evaluating ML datasets and dataset-working algorithms. We intend it to enable the "data ratchet," in which training sets will aid in evaluating test sets on the same problems, and vice versa. Such a feedback-driven strategy will generate a virtuous loop that will accelerate development of data-centric AI. The MLCommons Association will maintain DataPerf.

View paper on

Share this with someone who'll enjoy it:

Title:DataPerf: Benchmarks for Data-Centric AI Development

Paper and Code