Picture for Yinpeng Dong

Yinpeng Dong

Mitigating Overthinking in Large Reasoning Models via Manifold Steering

Add code
May 28, 2025
Viaarxiv icon

Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space

Add code
May 28, 2025
Viaarxiv icon

Red-Teaming Text-to-Image Systems by Rule-based Preference Modeling

Add code
May 27, 2025
Viaarxiv icon

Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives

Add code
May 23, 2025
Viaarxiv icon

Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries

Add code
May 21, 2025
Viaarxiv icon

RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability

Add code
Apr 14, 2025
Viaarxiv icon

Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement

Add code
Feb 26, 2025
Viaarxiv icon

STAIR: Improving Safety Alignment with Introspective Reasoning

Add code
Feb 04, 2025
Figure 1 for STAIR: Improving Safety Alignment with Introspective Reasoning
Figure 2 for STAIR: Improving Safety Alignment with Introspective Reasoning
Figure 3 for STAIR: Improving Safety Alignment with Introspective Reasoning
Figure 4 for STAIR: Improving Safety Alignment with Introspective Reasoning
Viaarxiv icon

Towards the Worst-case Robustness of Large Language Models

Add code
Jan 31, 2025
Figure 1 for Towards the Worst-case Robustness of Large Language Models
Figure 2 for Towards the Worst-case Robustness of Large Language Models
Figure 3 for Towards the Worst-case Robustness of Large Language Models
Figure 4 for Towards the Worst-case Robustness of Large Language Models
Viaarxiv icon

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability

Add code
Dec 24, 2024
Figure 1 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Figure 2 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Figure 3 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Figure 4 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Viaarxiv icon