Picture for Kevin Zhu

Kevin Zhu

George Mason University

A Few Bad Neurons: Isolating and Surgically Correcting Sycophancy

Add code
Jan 26, 2026
Viaarxiv icon

AMVICC: A Novel Benchmark for Cross-Modal Failure Mode Profiling for VLMs and IGMs

Add code
Jan 20, 2026
Viaarxiv icon

Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs

Add code
Jan 18, 2026
Viaarxiv icon

Interpretable Perturbation Modeling Through Biomedical Knowledge Graphs

Add code
Dec 31, 2025
Viaarxiv icon

When Less is More: 8-bit Quantization Improves Continual Learning in Large Language Models

Add code
Dec 22, 2025
Viaarxiv icon

Emergent Persuasion: Will LLMs Persuade Without Being Prompted?

Add code
Dec 20, 2025
Viaarxiv icon

Emergent World Beliefs: Exploring Transformers in Stochastic Games

Add code
Dec 18, 2025
Viaarxiv icon

TorchTraceAP: A New Benchmark Dataset for Detecting Performance Anti-Patterns in Computer Vision Models

Add code
Dec 16, 2025
Figure 1 for TorchTraceAP: A New Benchmark Dataset for Detecting Performance Anti-Patterns in Computer Vision Models
Figure 2 for TorchTraceAP: A New Benchmark Dataset for Detecting Performance Anti-Patterns in Computer Vision Models
Figure 3 for TorchTraceAP: A New Benchmark Dataset for Detecting Performance Anti-Patterns in Computer Vision Models
Figure 4 for TorchTraceAP: A New Benchmark Dataset for Detecting Performance Anti-Patterns in Computer Vision Models
Viaarxiv icon

Reasoning Relay: Evaluating Stability and Interchangeability of Large Language Models in Mathematical Reasoning

Add code
Dec 16, 2025
Viaarxiv icon

Factor(U,T): Controlling Untrusted AI by Monitoring their Plans

Add code
Dec 12, 2025
Figure 1 for Factor(U,T): Controlling Untrusted AI by Monitoring their Plans
Figure 2 for Factor(U,T): Controlling Untrusted AI by Monitoring their Plans
Figure 3 for Factor(U,T): Controlling Untrusted AI by Monitoring their Plans
Viaarxiv icon