Alert button

A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments

Jan 23, 2024
Zhengxuan Wu, Atticus Geiger, Jing Huang, Aryaman Arora, Thomas Icard, Christopher Potts, Noah D. Goodman

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: