Picture for Junfeng Fang

Junfeng Fang

Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs

Add code
Jun 16, 2025
Viaarxiv icon

We Should Identify and Mitigate Third-Party Safety Risks in MCP-Powered Agent Systems

Add code
Jun 16, 2025
Viaarxiv icon

AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint

Add code
Jun 08, 2025
Viaarxiv icon

Are Reasoning Models More Prone to Hallucination?

Add code
May 29, 2025
Viaarxiv icon

Advanced long-term earth system forecasting by learning the small-scale nature

Add code
May 26, 2025
Viaarxiv icon

Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation

Add code
May 22, 2025
Viaarxiv icon

LIFEBench: Evaluating Length Instruction Following in Large Language Models

Add code
May 22, 2025
Viaarxiv icon

AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models

Add code
May 22, 2025
Viaarxiv icon

UniErase: Unlearning Token as a Universal Erasure Primitive for Language Models

Add code
May 21, 2025
Viaarxiv icon

Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs

Add code
May 16, 2025
Viaarxiv icon