Picture for Tsui-Wei Weng

Tsui-Wei Weng

Resting Neurons, Active Insights: Improving Input Sparsification for Large Language Models

Add code
Dec 14, 2025
Viaarxiv icon

Interpretable and Steerable Concept Bottleneck Sparse Autoencoders

Add code
Dec 11, 2025
Figure 1 for Interpretable and Steerable Concept Bottleneck Sparse Autoencoders
Figure 2 for Interpretable and Steerable Concept Bottleneck Sparse Autoencoders
Figure 3 for Interpretable and Steerable Concept Bottleneck Sparse Autoencoders
Figure 4 for Interpretable and Steerable Concept Bottleneck Sparse Autoencoders
Viaarxiv icon

Graph Concept Bottleneck Models

Add code
Aug 19, 2025
Viaarxiv icon

Statistical Inference for Responsiveness Verification

Add code
Jul 02, 2025
Viaarxiv icon

Rethinking Crowd-Sourced Evaluation of Neuron Explanations

Add code
Jun 09, 2025
Viaarxiv icon

Evaluating Neuron Explanations: A Unified Framework with Sanity Checks

Add code
Jun 06, 2025
Viaarxiv icon

ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models

Add code
Mar 27, 2025
Figure 1 for ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
Figure 2 for ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
Figure 3 for ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
Figure 4 for ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
Viaarxiv icon

Effective Skill Unlearning through Intervention and Abstention

Add code
Mar 27, 2025
Figure 1 for Effective Skill Unlearning through Intervention and Abstention
Figure 2 for Effective Skill Unlearning through Intervention and Abstention
Figure 3 for Effective Skill Unlearning through Intervention and Abstention
Figure 4 for Effective Skill Unlearning through Intervention and Abstention
Viaarxiv icon

Interpretable Generative Models through Post-hoc Concept Bottlenecks

Add code
Mar 25, 2025
Viaarxiv icon

RAT: Boosting Misclassification Detection Ability without Extra Data

Add code
Mar 18, 2025
Viaarxiv icon