Picture for Carlos Georgescu

Carlos Georgescu

VADER: A Human-Evaluated Benchmark for Vulnerability Assessment, Detection, Explanation, and Remediation

Add code
May 26, 2025
Figure 1 for VADER: A Human-Evaluated Benchmark for Vulnerability Assessment, Detection, Explanation, and Remediation
Figure 2 for VADER: A Human-Evaluated Benchmark for Vulnerability Assessment, Detection, Explanation, and Remediation
Figure 3 for VADER: A Human-Evaluated Benchmark for Vulnerability Assessment, Detection, Explanation, and Remediation
Figure 4 for VADER: A Human-Evaluated Benchmark for Vulnerability Assessment, Detection, Explanation, and Remediation
Viaarxiv icon

FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities of Large Language Models

Add code
Jan 30, 2025
Figure 1 for FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities of Large Language Models
Figure 2 for FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities of Large Language Models
Figure 3 for FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities of Large Language Models
Figure 4 for FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities of Large Language Models
Viaarxiv icon