Picture for Zeming Wei

Zeming Wei

Autoregressive Models Rival Diffusion Models at ANY-ORDER Generation

Add code
Jan 19, 2026
Viaarxiv icon

Calibrated Adversarial Sampling: Multi-Armed Bandit-Guided Generalization Against Unforeseen Attacks

Add code
Nov 15, 2025
Viaarxiv icon

Automata-Based Steering of Large Language Models for Diverse Structured Generation

Add code
Nov 14, 2025
Viaarxiv icon

False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize

Add code
Sep 04, 2025
Figure 1 for False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize
Figure 2 for False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize
Figure 3 for False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize
Figure 4 for False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize
Viaarxiv icon

Identifying and Understanding Cross-Class Features in Adversarial Training

Add code
Jun 05, 2025
Viaarxiv icon

Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives

Add code
May 23, 2025
Figure 1 for Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives
Figure 2 for Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives
Figure 3 for Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives
Figure 4 for Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives
Viaarxiv icon

Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization

Add code
May 22, 2025
Viaarxiv icon

Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval

Add code
May 21, 2025
Viaarxiv icon

Advancing LLM Safe Alignment with Safety Representation Ranking

Add code
May 21, 2025
Viaarxiv icon

3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians

Add code
Apr 16, 2025
Viaarxiv icon