Picture for Rada Mihalcea

Rada Mihalcea

The Wrong Kind of Right: Quantifying and Localizing Misfired Alignment in LLMs

Add code
Jun 17, 2026
Viaarxiv icon

It Takes One to Bias Them All: Breaking Bad with One-Shot GRPO

Add code
Jun 09, 2026
Viaarxiv icon

Whose Norms? Disentangling Cultural and Personal Alignment in Large Language Models

Add code
Jun 05, 2026
Viaarxiv icon

The Age of Curiosity Meets the Age of AI: Benchmarking Child Safety in Large Language Models

Add code
May 26, 2026
Viaarxiv icon

How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks

Add code
Apr 24, 2026
Viaarxiv icon

SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models

Add code
Apr 21, 2026
Viaarxiv icon

When Do Language Models Endorse Limitations on Human Rights Principles?

Add code
Mar 04, 2026
Viaarxiv icon

Belief-Sim: Towards Belief-Driven Simulation of Demographic Misinformation Susceptibility

Add code
Mar 03, 2026
Viaarxiv icon

Copyright Detective: A Forensic System to Evidence LLMs Flickering Copyright Leakage Risks

Add code
Feb 05, 2026
Viaarxiv icon

SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests

Add code
Oct 06, 2025
Figure 1 for SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
Figure 2 for SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
Figure 3 for SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
Figure 4 for SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
Viaarxiv icon