Picture for Leonard Dung

Leonard Dung

Probing the Preferences of a Language Model: Integrating Verbal and Behavioral Tests of AI Welfare

Add code
Sep 09, 2025
Viaarxiv icon

Against racing to AGI: Cooperation, deterrence, and catastrophic risks

Add code
Jul 29, 2025
Viaarxiv icon

Misalignment or misuse? The AGI alignment tradeoff

Add code
Jun 04, 2025
Viaarxiv icon