Picture for Giovana Kerche Bonás

Giovana Kerche Bonás

LLM-Based Persuasion Enables Guardrail Override in Frontier LLMs

Add code
May 13, 2026
Viaarxiv icon

Measuring Opinion Bias and Sycophancy via LLM-based Coercion

Add code
Apr 23, 2026
Viaarxiv icon

MARCA: A Checklist-Based Benchmark for Multilingual Web Search

Add code
Apr 15, 2026
Viaarxiv icon

CAPITU: A Benchmark for Evaluating Instruction-Following in Brazilian Portuguese with Literary Context

Add code
Mar 23, 2026
Viaarxiv icon

Sabiá-4 Technical Report

Add code
Mar 10, 2026
Viaarxiv icon

Ticket-Bench: A Kickoff for Multilingual and Regionalized Agent Evaluation

Add code
Sep 17, 2025
Figure 1 for Ticket-Bench: A Kickoff for Multilingual and Regionalized Agent Evaluation
Figure 2 for Ticket-Bench: A Kickoff for Multilingual and Regionalized Agent Evaluation
Figure 3 for Ticket-Bench: A Kickoff for Multilingual and Regionalized Agent Evaluation
Figure 4 for Ticket-Bench: A Kickoff for Multilingual and Regionalized Agent Evaluation
Viaarxiv icon

BRoverbs -- Measuring how much LLMs understand Portuguese proverbs

Add code
Sep 10, 2025
Figure 1 for BRoverbs -- Measuring how much LLMs understand Portuguese proverbs
Figure 2 for BRoverbs -- Measuring how much LLMs understand Portuguese proverbs
Figure 3 for BRoverbs -- Measuring how much LLMs understand Portuguese proverbs
Figure 4 for BRoverbs -- Measuring how much LLMs understand Portuguese proverbs
Viaarxiv icon

TiEBe: A Benchmark for Assessing the Current Knowledge of Large Language Models

Add code
Jan 13, 2025
Figure 1 for TiEBe: A Benchmark for Assessing the Current Knowledge of Large Language Models
Figure 2 for TiEBe: A Benchmark for Assessing the Current Knowledge of Large Language Models
Figure 3 for TiEBe: A Benchmark for Assessing the Current Knowledge of Large Language Models
Figure 4 for TiEBe: A Benchmark for Assessing the Current Knowledge of Large Language Models
Viaarxiv icon

Sabiá-3 Technical Report

Add code
Oct 15, 2024
Viaarxiv icon