Picture for Mubashara Akhtar

Mubashara Akhtar

Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads

Add code
Nov 11, 2025
Viaarxiv icon

Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations

Add code
Nov 06, 2025
Viaarxiv icon

Chimera: Diagnosing Shortcut Learning in Visual-Language Understanding

Add code
Sep 26, 2025
Viaarxiv icon

LEXam: Benchmarking Legal Reasoning on 340 Law Exams

Add code
May 19, 2025
Figure 1 for LEXam: Benchmarking Legal Reasoning on 340 Law Exams
Figure 2 for LEXam: Benchmarking Legal Reasoning on 340 Law Exams
Figure 3 for LEXam: Benchmarking Legal Reasoning on 340 Law Exams
Figure 4 for LEXam: Benchmarking Legal Reasoning on 340 Law Exams
Viaarxiv icon

Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking

Add code
Nov 08, 2024
Figure 1 for Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking
Figure 2 for Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking
Figure 3 for Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking
Figure 4 for Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking
Viaarxiv icon

The Automated Verification of Textual Claims (AVeriTeC) Shared Task

Add code
Oct 31, 2024
Figure 1 for The Automated Verification of Textual Claims (AVeriTeC) Shared Task
Figure 2 for The Automated Verification of Textual Claims (AVeriTeC) Shared Task
Figure 3 for The Automated Verification of Textual Claims (AVeriTeC) Shared Task
Figure 4 for The Automated Verification of Textual Claims (AVeriTeC) Shared Task
Viaarxiv icon

TANQ: An open domain dataset of table answered questions

Add code
May 13, 2024
Figure 1 for TANQ: An open domain dataset of table answered questions
Figure 2 for TANQ: An open domain dataset of table answered questions
Figure 3 for TANQ: An open domain dataset of table answered questions
Figure 4 for TANQ: An open domain dataset of table answered questions
Viaarxiv icon

Croissant: A Metadata Format for ML-Ready Datasets

Add code
Mar 28, 2024
Viaarxiv icon

ChartCheck: An Evidence-Based Fact-Checking Dataset over Real-World Chart Images

Add code
Nov 13, 2023
Viaarxiv icon

Exploring the Numerical Reasoning Capabilities of Language Models: A Comprehensive Analysis on Tabular Data

Add code
Nov 03, 2023
Viaarxiv icon