Picture for Anjana Arunkumar

Anjana Arunkumar

LINGO : Visually Debiasing Natural Language Instructions to Support Task Diversity

Add code
Apr 12, 2023
Figure 1 for LINGO : Visually Debiasing Natural Language Instructions to Support Task Diversity
Figure 2 for LINGO : Visually Debiasing Natural Language Instructions to Support Task Diversity
Figure 3 for LINGO : Visually Debiasing Natural Language Instructions to Support Task Diversity
Figure 4 for LINGO : Visually Debiasing Natural Language Instructions to Support Task Diversity
Viaarxiv icon

Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow

Add code
Feb 09, 2023
Figure 1 for Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow
Figure 2 for Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow
Figure 3 for Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow
Figure 4 for Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow
Viaarxiv icon

A Survey of Parameters Associated with the Quality of Benchmarks in NLP

Add code
Oct 14, 2022
Figure 1 for A Survey of Parameters Associated with the Quality of Benchmarks in NLP
Figure 2 for A Survey of Parameters Associated with the Quality of Benchmarks in NLP
Figure 3 for A Survey of Parameters Associated with the Quality of Benchmarks in NLP
Figure 4 for A Survey of Parameters Associated with the Quality of Benchmarks in NLP
Viaarxiv icon

Hardness of Samples Need to be Quantified for a Reliable Evaluation System: Exploring Potential Opportunities with a New Task

Add code
Oct 14, 2022
Figure 1 for Hardness of Samples Need to be Quantified for a Reliable Evaluation System: Exploring Potential Opportunities with a New Task
Figure 2 for Hardness of Samples Need to be Quantified for a Reliable Evaluation System: Exploring Potential Opportunities with a New Task
Figure 3 for Hardness of Samples Need to be Quantified for a Reliable Evaluation System: Exploring Potential Opportunities with a New Task
Figure 4 for Hardness of Samples Need to be Quantified for a Reliable Evaluation System: Exploring Potential Opportunities with a New Task
Viaarxiv icon

Investigating the Failure Modes of the AUC metric and Exploring Alternatives for Evaluating Systems in Safety Critical Applications

Add code
Oct 10, 2022
Figure 1 for Investigating the Failure Modes of the AUC metric and Exploring Alternatives for Evaluating Systems in Safety Critical Applications
Figure 2 for Investigating the Failure Modes of the AUC metric and Exploring Alternatives for Evaluating Systems in Safety Critical Applications
Figure 3 for Investigating the Failure Modes of the AUC metric and Exploring Alternatives for Evaluating Systems in Safety Critical Applications
Figure 4 for Investigating the Failure Modes of the AUC metric and Exploring Alternatives for Evaluating Systems in Safety Critical Applications
Viaarxiv icon

Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks

Add code
Apr 16, 2022
Figure 1 for Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks
Figure 2 for Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks
Figure 3 for Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks
Figure 4 for Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks
Viaarxiv icon

A Proposal to Study "Is High Quality Data All We Need?"

Add code
Mar 12, 2022
Figure 1 for A Proposal to Study "Is High Quality Data All We Need?"
Figure 2 for A Proposal to Study "Is High Quality Data All We Need?"
Viaarxiv icon

Front Contribution instead of Back Propagation

Add code
Jun 10, 2021
Figure 1 for Front Contribution instead of Back Propagation
Figure 2 for Front Contribution instead of Back Propagation
Figure 3 for Front Contribution instead of Back Propagation
Figure 4 for Front Contribution instead of Back Propagation
Viaarxiv icon

How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation

Add code
Jun 10, 2021
Figure 1 for How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation
Figure 2 for How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation
Figure 3 for How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation
Figure 4 for How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation
Viaarxiv icon

DQI: A Guide to Benchmark Evaluation

Add code
Aug 10, 2020
Figure 1 for DQI: A Guide to Benchmark Evaluation
Figure 2 for DQI: A Guide to Benchmark Evaluation
Figure 3 for DQI: A Guide to Benchmark Evaluation
Figure 4 for DQI: A Guide to Benchmark Evaluation
Viaarxiv icon