Picture for Zeming Wei

Zeming Wei

Identifying and Understanding Cross-Class Features in Adversarial Training

Add code
Jun 05, 2025
Viaarxiv icon

Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives

Add code
May 23, 2025
Viaarxiv icon

Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization

Add code
May 22, 2025
Viaarxiv icon

Advancing LLM Safe Alignment with Safety Representation Ranking

Add code
May 21, 2025
Viaarxiv icon

Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval

Add code
May 21, 2025
Viaarxiv icon

3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians

Add code
Apr 16, 2025
Viaarxiv icon

Towards the Worst-case Robustness of Large Language Models

Add code
Jan 31, 2025
Figure 1 for Towards the Worst-case Robustness of Large Language Models
Figure 2 for Towards the Worst-case Robustness of Large Language Models
Figure 3 for Towards the Worst-case Robustness of Large Language Models
Figure 4 for Towards the Worst-case Robustness of Large Language Models
Viaarxiv icon

MILE: A Mutation Testing Framework of In-Context Learning Systems

Add code
Sep 07, 2024
Viaarxiv icon

Automata Extraction from Transformers

Add code
Jun 08, 2024
Viaarxiv icon

A Theoretical Understanding of Self-Correction through In-context Alignment

Add code
May 28, 2024
Viaarxiv icon