Picture for Lawrence Chan

Lawrence Chan

Mathematical Models of Computation in Superposition

Add code
Aug 10, 2024
Viaarxiv icon

Compact Proofs of Model Performance via Mechanistic Interpretability

Add code
Jun 24, 2024
Viaarxiv icon

Provable Guarantees for Model Performance via Mechanistic Interpretability

Add code
Jun 18, 2024
Viaarxiv icon

Evaluating Language-Model Agents on Realistic Autonomous Tasks

Add code
Jan 04, 2024
Viaarxiv icon

A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations

Add code
Feb 06, 2023
Viaarxiv icon

Progress measures for grokking via mechanistic interpretability

Add code
Jan 13, 2023
Viaarxiv icon

Language models are better than humans at next-token prediction

Add code
Dec 21, 2022
Viaarxiv icon

Adversarial Training for High-Stakes Reliability

Add code
May 04, 2022
Figure 1 for Adversarial Training for High-Stakes Reliability
Figure 2 for Adversarial Training for High-Stakes Reliability
Figure 3 for Adversarial Training for High-Stakes Reliability
Figure 4 for Adversarial Training for High-Stakes Reliability
Viaarxiv icon

Human irrationality: both bad and good for reward inference

Add code
Nov 12, 2021
Figure 1 for Human irrationality: both bad and good for reward inference
Figure 2 for Human irrationality: both bad and good for reward inference
Figure 3 for Human irrationality: both bad and good for reward inference
Figure 4 for Human irrationality: both bad and good for reward inference
Viaarxiv icon

Optimal Cost Design for Model Predictive Control

Add code
Apr 23, 2021
Figure 1 for Optimal Cost Design for Model Predictive Control
Figure 2 for Optimal Cost Design for Model Predictive Control
Figure 3 for Optimal Cost Design for Model Predictive Control
Viaarxiv icon