Picture for Seong Hah Cho

Seong Hah Cho

Value Entanglement: Conflation Between Different Kinds of Good In (Some) Large Language Models

Add code
Feb 22, 2026
Viaarxiv icon

The Steganographic Potentials of Language Models

Add code
May 06, 2025
Figure 1 for The Steganographic Potentials of Language Models
Figure 2 for The Steganographic Potentials of Language Models
Figure 3 for The Steganographic Potentials of Language Models
Figure 4 for The Steganographic Potentials of Language Models
Viaarxiv icon

Identifying Cooperative Personalities in Multi-agent Contexts through Personality Steering with Representation Engineering

Add code
Mar 17, 2025
Figure 1 for Identifying Cooperative Personalities in Multi-agent Contexts through Personality Steering with Representation Engineering
Figure 2 for Identifying Cooperative Personalities in Multi-agent Contexts through Personality Steering with Representation Engineering
Figure 3 for Identifying Cooperative Personalities in Multi-agent Contexts through Personality Steering with Representation Engineering
Figure 4 for Identifying Cooperative Personalities in Multi-agent Contexts through Personality Steering with Representation Engineering
Viaarxiv icon

Inducing Human-like Biases in Moral Reasoning Language Models

Add code
Nov 23, 2024
Figure 1 for Inducing Human-like Biases in Moral Reasoning Language Models
Figure 2 for Inducing Human-like Biases in Moral Reasoning Language Models
Figure 3 for Inducing Human-like Biases in Moral Reasoning Language Models
Figure 4 for Inducing Human-like Biases in Moral Reasoning Language Models
Viaarxiv icon