Picture for Yinpeng Dong

Yinpeng Dong

Understanding and Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding

Add code
Jun 14, 2025
Viaarxiv icon

Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space

Add code
May 28, 2025
Viaarxiv icon

Mitigating Overthinking in Large Reasoning Models via Manifold Steering

Add code
May 28, 2025
Viaarxiv icon

Red-Teaming Text-to-Image Systems by Rule-based Preference Modeling

Add code
May 27, 2025
Viaarxiv icon

Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives

Add code
May 23, 2025
Viaarxiv icon

Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries

Add code
May 21, 2025
Viaarxiv icon

RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability

Add code
Apr 14, 2025
Viaarxiv icon

Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement

Add code
Feb 26, 2025
Viaarxiv icon

STAIR: Improving Safety Alignment with Introspective Reasoning

Add code
Feb 04, 2025
Figure 1 for STAIR: Improving Safety Alignment with Introspective Reasoning
Figure 2 for STAIR: Improving Safety Alignment with Introspective Reasoning
Figure 3 for STAIR: Improving Safety Alignment with Introspective Reasoning
Figure 4 for STAIR: Improving Safety Alignment with Introspective Reasoning
Viaarxiv icon

Towards the Worst-case Robustness of Large Language Models

Add code
Jan 31, 2025
Figure 1 for Towards the Worst-case Robustness of Large Language Models
Figure 2 for Towards the Worst-case Robustness of Large Language Models
Figure 3 for Towards the Worst-case Robustness of Large Language Models
Figure 4 for Towards the Worst-case Robustness of Large Language Models
Viaarxiv icon