Picture for Ashwinee Panda

Ashwinee Panda

From Refusal Tokens to Refusal Control: Discovering and Steering Category-Specific Refusal Directions

Add code
Mar 09, 2026
Viaarxiv icon

Multi-Token Prediction via Self-Distillation

Add code
Feb 05, 2026
Viaarxiv icon

A Few Bad Neurons: Isolating and Surgically Correcting Sycophancy

Add code
Jan 26, 2026
Viaarxiv icon

Modeling and Predicting Multi-Turn Answer Instability in Large Language Models

Add code
Nov 12, 2025
Figure 1 for Modeling and Predicting Multi-Turn Answer Instability in Large Language Models
Figure 2 for Modeling and Predicting Multi-Turn Answer Instability in Large Language Models
Figure 3 for Modeling and Predicting Multi-Turn Answer Instability in Large Language Models
Figure 4 for Modeling and Predicting Multi-Turn Answer Instability in Large Language Models
Viaarxiv icon

SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought

Add code
Nov 11, 2025
Viaarxiv icon

Alignment-Constrained Dynamic Pruning for LLMs: Identifying and Preserving Alignment-Critical Circuits

Add code
Nov 09, 2025
Viaarxiv icon

Dense Backpropagation Improves Training for Sparse Mixture-of-Experts

Add code
Apr 18, 2025
Viaarxiv icon

Analysis of Attention in Video Diffusion Transformers

Add code
Apr 14, 2025
Viaarxiv icon

LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation

Add code
Apr 10, 2025
Viaarxiv icon

Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs

Add code
Apr 04, 2025
Viaarxiv icon