Picture for Liwei Jiang

Liwei Jiang

ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment

Add code
Feb 19, 2026
Viaarxiv icon

Pluralistic Behavior Suite: Stress-Testing Multi-Turn Adherence to Custom Behavioral Policies

Add code
Nov 07, 2025
Viaarxiv icon

Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)

Add code
Oct 27, 2025
Figure 1 for Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
Figure 2 for Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
Figure 3 for Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
Figure 4 for Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
Viaarxiv icon

Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models

Add code
Jun 09, 2025
Viaarxiv icon

X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents

Add code
Apr 15, 2025
Viaarxiv icon

PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

Add code
Apr 06, 2025
Viaarxiv icon

Online Covariance Estimation in Nonsmooth Stochastic Approximation

Add code
Feb 07, 2025
Figure 1 for Online Covariance Estimation in Nonsmooth Stochastic Approximation
Viaarxiv icon

SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation

Add code
Oct 22, 2024
Figure 1 for SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation
Figure 2 for SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation
Figure 3 for SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation
Figure 4 for SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation
Viaarxiv icon

To Err is AI : A Case Study Informing LLM Flaw Reporting Practices

Add code
Oct 15, 2024
Figure 1 for To Err is AI : A Case Study Informing LLM Flaw Reporting Practices
Figure 2 for To Err is AI : A Case Study Informing LLM Flaw Reporting Practices
Figure 3 for To Err is AI : A Case Study Informing LLM Flaw Reporting Practices
Figure 4 for To Err is AI : A Case Study Informing LLM Flaw Reporting Practices
Viaarxiv icon

AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

Add code
Oct 05, 2024
Viaarxiv icon