Picture for Karolina Stańczak

Karolina Stańczak

Value Drifts: Tracing Value Alignment During LLM Post-Training

Add code
Oct 30, 2025
Viaarxiv icon

CulturalFrames: Assessing Cultural Expectation Alignment in Text-to-Image Models and Evaluation Metrics

Add code
Jun 10, 2025
Viaarxiv icon

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Add code
Apr 11, 2025
Figure 1 for AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Figure 2 for AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Figure 3 for AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Figure 4 for AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Viaarxiv icon

SafeArena: Evaluating the Safety of Autonomous Web Agents

Add code
Mar 06, 2025
Figure 1 for SafeArena: Evaluating the Safety of Autonomous Web Agents
Figure 2 for SafeArena: Evaluating the Safety of Autonomous Web Agents
Figure 3 for SafeArena: Evaluating the Safety of Autonomous Web Agents
Figure 4 for SafeArena: Evaluating the Safety of Autonomous Web Agents
Viaarxiv icon

Benchmarking Vision Language Models for Cultural Understanding

Add code
Jul 15, 2024
Viaarxiv icon

A Multilingual Perspective on Probing Gender Bias

Add code
Mar 15, 2024
Viaarxiv icon

Grammatical Gender's Influence on Distributional Semantics: A Causal Perspective

Add code
Nov 30, 2023
Viaarxiv icon

Social Bias Probing: Fairness Benchmarking for Language Models

Add code
Nov 15, 2023
Viaarxiv icon

Measuring Intersectional Biases in Historical Documents

Add code
May 21, 2023
Figure 1 for Measuring Intersectional Biases in Historical Documents
Figure 2 for Measuring Intersectional Biases in Historical Documents
Figure 3 for Measuring Intersectional Biases in Historical Documents
Figure 4 for Measuring Intersectional Biases in Historical Documents
Viaarxiv icon

Measuring Gender Bias in West Slavic Language Models

Add code
Apr 13, 2023
Viaarxiv icon