Picture for Junfeng Fang

Junfeng Fang

Contrastive Weak-to-strong Generalization

Add code
Oct 09, 2025
Viaarxiv icon

On Predictability of Reinforcement Learning Dynamics for Large Language Models

Add code
Oct 02, 2025
Viaarxiv icon

Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs

Add code
Jun 16, 2025
Figure 1 for Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs
Figure 2 for Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs
Figure 3 for Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs
Figure 4 for Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs
Viaarxiv icon

We Should Identify and Mitigate Third-Party Safety Risks in MCP-Powered Agent Systems

Add code
Jun 16, 2025
Viaarxiv icon

AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint

Add code
Jun 08, 2025
Viaarxiv icon

Are Reasoning Models More Prone to Hallucination?

Add code
May 29, 2025
Viaarxiv icon

Advanced long-term earth system forecasting by learning the small-scale nature

Add code
May 26, 2025
Viaarxiv icon

LIFEBench: Evaluating Length Instruction Following in Large Language Models

Add code
May 22, 2025
Viaarxiv icon

Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation

Add code
May 22, 2025
Viaarxiv icon

AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models

Add code
May 22, 2025
Figure 1 for AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
Figure 2 for AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
Figure 3 for AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
Figure 4 for AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
Viaarxiv icon