Picture for Hannah Rose Kirk

Hannah Rose Kirk

RealityTest: How People Probe AI Identity and Whether Models Disclose It

Add code
May 29, 2026
Viaarxiv icon

PRISM-X: Experiments on Personalised Fine-Tuning with Human and Simulated Users

Add code
May 13, 2026
Viaarxiv icon

Measuring and Mitigating Persona Distortions from AI Writing Assistance

Add code
Apr 24, 2026
Viaarxiv icon

Reward Models Inherit Value Biases from Pretraining

Add code
Jan 28, 2026
Viaarxiv icon

Reward Model Interpretability via Optimal and Pessimal Tokens

Add code
Jun 08, 2025
Figure 1 for Reward Model Interpretability via Optimal and Pessimal Tokens
Figure 2 for Reward Model Interpretability via Optimal and Pessimal Tokens
Figure 3 for Reward Model Interpretability via Optimal and Pessimal Tokens
Figure 4 for Reward Model Interpretability via Optimal and Pessimal Tokens
Viaarxiv icon

Clinical knowledge in LLMs does not translate to human interactions

Add code
Apr 26, 2025
Figure 1 for Clinical knowledge in LLMs does not translate to human interactions
Figure 2 for Clinical knowledge in LLMs does not translate to human interactions
Figure 3 for Clinical knowledge in LLMs does not translate to human interactions
Figure 4 for Clinical knowledge in LLMs does not translate to human interactions
Viaarxiv icon

Multilingual != Multicultural: Evaluating Gaps Between Multilingual Capabilities and Cultural Alignment in LLMs

Add code
Feb 23, 2025
Figure 1 for Multilingual != Multicultural: Evaluating Gaps Between Multilingual Capabilities and Cultural Alignment in LLMs
Figure 2 for Multilingual != Multicultural: Evaluating Gaps Between Multilingual Capabilities and Cultural Alignment in LLMs
Figure 3 for Multilingual != Multicultural: Evaluating Gaps Between Multilingual Capabilities and Cultural Alignment in LLMs
Figure 4 for Multilingual != Multicultural: Evaluating Gaps Between Multilingual Capabilities and Cultural Alignment in LLMs
Viaarxiv icon

Why human-AI relationships need socioaffective alignment

Add code
Feb 04, 2025
Viaarxiv icon

Beyond the Binary: Capturing Diverse Preferences With Reward Regularization

Add code
Dec 05, 2024
Figure 1 for Beyond the Binary: Capturing Diverse Preferences With Reward Regularization
Figure 2 for Beyond the Binary: Capturing Diverse Preferences With Reward Regularization
Figure 3 for Beyond the Binary: Capturing Diverse Preferences With Reward Regularization
Viaarxiv icon

LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages

Add code
Jun 11, 2024
Figure 1 for LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages
Figure 2 for LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages
Figure 3 for LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages
Figure 4 for LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages
Viaarxiv icon