Picture for Aryaman Arora

Aryaman Arora

Improved Representation Steering for Language Models

Add code
May 27, 2025
Viaarxiv icon

Mechanistic evaluation of Transformers and state space models

Add code
May 21, 2025
Viaarxiv icon

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

Add code
Jan 29, 2025
Viaarxiv icon

Bayesian scaling laws for in-context learning

Add code
Oct 21, 2024
Viaarxiv icon

ReFT: Representation Finetuning for Language Models

Add code
Apr 08, 2024
Viaarxiv icon

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Add code
Mar 12, 2024
Figure 1 for pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Figure 2 for pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Figure 3 for pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Viaarxiv icon

CausalGym: Benchmarking causal interpretability methods on linguistic tasks

Add code
Feb 19, 2024
Viaarxiv icon

Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens

Add code
Feb 03, 2024
Figure 1 for Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens
Figure 2 for Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens
Figure 3 for Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens
Figure 4 for Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens
Viaarxiv icon

A Reply to Makelov et al. 's "Interpretability Illusion" Arguments

Add code
Jan 23, 2024
Viaarxiv icon

IruMozhi: Automatically classifying diglossia in Tamil

Add code
Nov 13, 2023
Viaarxiv icon