Picture for Daniel Tan

Daniel Tan

Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time

Add code
Oct 05, 2025
Viaarxiv icon

Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models

Add code
Feb 27, 2025
Viaarxiv icon

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Add code
Feb 25, 2025
Viaarxiv icon

Complete Implementation of WXF Chinese Chess Rules

Add code
Dec 23, 2024
Figure 1 for Complete Implementation of WXF Chinese Chess Rules
Figure 2 for Complete Implementation of WXF Chinese Chess Rules
Figure 3 for Complete Implementation of WXF Chinese Chess Rules
Figure 4 for Complete Implementation of WXF Chinese Chess Rules
Viaarxiv icon

Study of the Proper NNUE Dataset

Add code
Dec 23, 2024
Figure 1 for Study of the Proper NNUE Dataset
Figure 2 for Study of the Proper NNUE Dataset
Figure 3 for Study of the Proper NNUE Dataset
Viaarxiv icon

Analyzing the Generalization and Reliability of Steering Vectors -- ICML 2024

Add code
Jul 17, 2024
Viaarxiv icon

Low-Cost Generation and Evaluation of Dictionary Example Sentences

Add code
Apr 09, 2024
Viaarxiv icon

cDVGAN: One Flexible Model for Multi-class Gravitational Wave Signal and Glitch Generation

Add code
Feb 07, 2024
Viaarxiv icon

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

Add code
Feb 05, 2024
Figure 1 for Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning
Figure 2 for Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning
Figure 3 for Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning
Figure 4 for Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning
Viaarxiv icon