Picture for Min Yang

Min Yang

IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property

Add code
Apr 22, 2025
Viaarxiv icon

OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation

Add code
Apr 18, 2025
Viaarxiv icon

From Prompting to Alignment: A Generative Framework for Query Recommendation

Add code
Apr 14, 2025
Viaarxiv icon

Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization

Add code
Mar 19, 2025
Viaarxiv icon

Revisiting Backdoor Attacks on Time Series Classification in the Frequency Domain

Add code
Mar 12, 2025
Viaarxiv icon

PEToolLLM: Towards Personalized Tool Learning in Large Language Models

Add code
Feb 26, 2025
Viaarxiv icon

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Add code
Feb 20, 2025
Viaarxiv icon

Exploring the Impact of Personality Traits on LLM Bias and Toxicity

Add code
Feb 18, 2025
Viaarxiv icon

xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking

Add code
Jan 30, 2025
Figure 1 for xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking
Figure 2 for xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking
Figure 3 for xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking
Figure 4 for xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking
Viaarxiv icon

Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink

Add code
Jan 25, 2025
Viaarxiv icon