Picture for Max Kleiman-Weiner

Max Kleiman-Weiner

The Lock-in Hypothesis: Stagnation by Algorithm

Add code
Jun 06, 2025
Viaarxiv icon

Are Language Models Consequentialist or Deontological Moral Reasoners?

Add code
May 27, 2025
Viaarxiv icon

Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination

Add code
Apr 20, 2025
Viaarxiv icon

SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation

Add code
Oct 22, 2024
Figure 1 for SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation
Figure 2 for SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation
Figure 3 for SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation
Figure 4 for SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation
Viaarxiv icon

Value Internalization: Learning and Generalizing from Social Reward

Add code
Jul 19, 2024
Viaarxiv icon

Multilingual Trolley Problems for Language Models

Add code
Jul 02, 2024
Figure 1 for Multilingual Trolley Problems for Language Models
Figure 2 for Multilingual Trolley Problems for Language Models
Figure 3 for Multilingual Trolley Problems for Language Models
Figure 4 for Multilingual Trolley Problems for Language Models
Viaarxiv icon

Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents

Add code
Apr 25, 2024
Figure 1 for Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents
Figure 2 for Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents
Figure 3 for Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents
Figure 4 for Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents
Viaarxiv icon

CLadder: A Benchmark to Assess Causal Reasoning Capabilities of Language Models

Add code
Dec 07, 2023
Viaarxiv icon

Learning to Coordinate with Humans using Action Features

Add code
Jan 29, 2022
Viaarxiv icon

When Is It Acceptable to Break the Rules? Knowledge Representation of Moral Judgement Based on Empirical Data

Add code
Jan 19, 2022
Viaarxiv icon