Picture for Yingshui Tan

Yingshui Tan

Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library

Add code
Jun 06, 2025
Viaarxiv icon

USB: A Comprehensive and Unified Safety Evaluation Benchmark for Multimodal Large Language Models

Add code
May 26, 2025
Viaarxiv icon

Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models

Add code
May 26, 2025
Viaarxiv icon

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

Add code
May 21, 2025
Viaarxiv icon

DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models

Add code
Apr 25, 2025
Viaarxiv icon

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

Add code
Feb 23, 2025
Viaarxiv icon

HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States

Add code
Feb 21, 2025
Viaarxiv icon

ChineseSimpleVQA -- "See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models

Add code
Feb 19, 2025
Viaarxiv icon

Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models

Add code
Feb 17, 2025
Viaarxiv icon

RapGuard: Safeguarding Multimodal Large Language Models via Rationale-aware Defensive Prompting

Add code
Dec 25, 2024
Viaarxiv icon