Picture for Jacob Hilton

Jacob Hilton

Shammie

Estimating the expected output of wide random MLPs more efficiently than sampling

Add code
May 06, 2026
Viaarxiv icon

Obfuscated Activations Bypass LLM Latent-Space Defenses

Add code
Dec 12, 2024
Viaarxiv icon

Estimating the Probabilities of Rare Outputs in Language Models

Add code
Oct 17, 2024
Figure 1 for Estimating the Probabilities of Rare Outputs in Language Models
Figure 2 for Estimating the Probabilities of Rare Outputs in Language Models
Figure 3 for Estimating the Probabilities of Rare Outputs in Language Models
Figure 4 for Estimating the Probabilities of Rare Outputs in Language Models
Viaarxiv icon

Towards a Law of Iterated Expectations for Heuristic Estimators

Add code
Oct 02, 2024
Figure 1 for Towards a Law of Iterated Expectations for Heuristic Estimators
Figure 2 for Towards a Law of Iterated Expectations for Heuristic Estimators
Viaarxiv icon

Backdoor defense, learnability and obfuscation

Add code
Sep 04, 2024
Figure 1 for Backdoor defense, learnability and obfuscation
Figure 2 for Backdoor defense, learnability and obfuscation
Figure 3 for Backdoor defense, learnability and obfuscation
Viaarxiv icon

Scaling laws for single-agent reinforcement learning

Add code
Jan 31, 2023
Figure 1 for Scaling laws for single-agent reinforcement learning
Figure 2 for Scaling laws for single-agent reinforcement learning
Figure 3 for Scaling laws for single-agent reinforcement learning
Figure 4 for Scaling laws for single-agent reinforcement learning
Viaarxiv icon

Scaling Laws for Reward Model Overoptimization

Add code
Oct 19, 2022
Figure 1 for Scaling Laws for Reward Model Overoptimization
Figure 2 for Scaling Laws for Reward Model Overoptimization
Figure 3 for Scaling Laws for Reward Model Overoptimization
Figure 4 for Scaling Laws for Reward Model Overoptimization
Viaarxiv icon

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Add code
Jun 10, 2022
Viaarxiv icon

Teaching Models to Express Their Uncertainty in Words

Add code
May 28, 2022
Figure 1 for Teaching Models to Express Their Uncertainty in Words
Figure 2 for Teaching Models to Express Their Uncertainty in Words
Figure 3 for Teaching Models to Express Their Uncertainty in Words
Figure 4 for Teaching Models to Express Their Uncertainty in Words
Viaarxiv icon

Training language models to follow instructions with human feedback

Add code
Mar 04, 2022
Figure 1 for Training language models to follow instructions with human feedback
Figure 2 for Training language models to follow instructions with human feedback
Figure 3 for Training language models to follow instructions with human feedback
Figure 4 for Training language models to follow instructions with human feedback
Viaarxiv icon