Picture for Duen Horng Chau

Duen Horng Chau

Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety

Add code
Jun 05, 2025
Viaarxiv icon

Shape it Up! Restoring LLM Safety during Finetuning

Add code
May 22, 2025
Viaarxiv icon

ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features

Add code
Feb 06, 2025
Viaarxiv icon

Adversarial Attacks Using Differentiable Rendering: A Survey

Add code
Nov 14, 2024
Viaarxiv icon

Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors

Add code
Nov 12, 2024
Figure 1 for Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors
Figure 2 for Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors
Figure 3 for Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors
Figure 4 for Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors
Viaarxiv icon

Dense Associative Memory Through the Lens of Random Features

Add code
Oct 31, 2024
Figure 1 for Dense Associative Memory Through the Lens of Random Features
Figure 2 for Dense Associative Memory Through the Lens of Random Features
Figure 3 for Dense Associative Memory Through the Lens of Random Features
Figure 4 for Dense Associative Memory Through the Lens of Random Features
Viaarxiv icon

Transformer Explainer: Interactive Learning of Text-Generative Models

Add code
Aug 08, 2024
Figure 1 for Transformer Explainer: Interactive Learning of Text-Generative Models
Viaarxiv icon

MeMemo: On-device Retrieval Augmentation for Private and Personalized Text Generation

Add code
Jul 02, 2024
Viaarxiv icon

Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models

Add code
May 28, 2024
Viaarxiv icon

LLM Attributor: Interactive Visual Attribution for LLM Generation

Add code
Apr 01, 2024
Figure 1 for LLM Attributor: Interactive Visual Attribution for LLM Generation
Figure 2 for LLM Attributor: Interactive Visual Attribution for LLM Generation
Viaarxiv icon