Picture for Yan Teng

Yan Teng

Mechanistic Origin of Moral Indifference in Language Models

Add code
Mar 16, 2026
Viaarxiv icon

SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond

Add code
Mar 02, 2026
Viaarxiv icon

From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation

Add code
Jan 28, 2026
Viaarxiv icon

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

Add code
Jan 04, 2026
Viaarxiv icon

UniMark: Artificial Intelligence Generated Content Identification Toolkit

Add code
Dec 13, 2025
Viaarxiv icon

Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs

Add code
Nov 16, 2025
Viaarxiv icon

A Rigorous Benchmark with Multidimensional Evaluation for Deep Research Agents: From Answers to Reports

Add code
Oct 02, 2025
Viaarxiv icon

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law

Add code
Jul 24, 2025
Figure 1 for SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law
Figure 2 for SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law
Figure 3 for SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law
Figure 4 for SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law
Viaarxiv icon

JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models

Add code
May 26, 2025
Viaarxiv icon

SafeVid: Toward Safety Aligned Video Large Multimodal Models

Add code
May 17, 2025
Viaarxiv icon