Alert button
Picture for J. Zico Kolter

J. Zico Kolter

Alert button

An Axiomatic Approach to Model-Agnostic Concept Explanations

Jan 12, 2024
Zhili Feng, Michal Moshkovitz, Dotan Di Castro, J. Zico Kolter

Viaarxiv icon

TOFU: A Task of Fictitious Unlearning for LLMs

Jan 11, 2024
Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C. Lipton, J. Zico Kolter

Viaarxiv icon

Deep Equilibrium Based Neural Operators for Steady-State PDEs

Nov 30, 2023
Tanya Marwah, Ashwini Pokle, J. Zico Kolter, Zachary C. Lipton, Jianfeng Lu, Andrej Risteski

Viaarxiv icon

Manifold Preserving Guided Diffusion

Nov 28, 2023
Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J. Zico Kolter, Ruslan Salakhutdinov, Stefano Ermon

Viaarxiv icon

Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning

Nov 25, 2023
Melrose Roderick, Gaurav Manek, Felix Berkenkamp, J. Zico Kolter

Viaarxiv icon

TorchDEQ: A Library for Deep Equilibrium Models

Oct 28, 2023
Zhengyang Geng, J. Zico Kolter

Viaarxiv icon

On the Neural Tangent Kernel of Equilibrium Models

Oct 21, 2023
Zhili Feng, J. Zico Kolter

Viaarxiv icon

Representation Engineering: A Top-Down Approach to AI Transparency

Oct 10, 2023
Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks

Figure 1 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 2 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 3 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 4 for Representation Engineering: A Top-Down Approach to AI Transparency
Viaarxiv icon

Understanding prompt engineering may not require rethinking generalization

Oct 06, 2023
Victor Akinwande, Yiding Jiang, Dylan Sam, J. Zico Kolter

Viaarxiv icon

Universal and Transferable Adversarial Attacks on Aligned Language Models

Jul 27, 2023
Andy Zou, Zifan Wang, J. Zico Kolter, Matt Fredrikson

Figure 1 for Universal and Transferable Adversarial Attacks on Aligned Language Models
Figure 2 for Universal and Transferable Adversarial Attacks on Aligned Language Models
Figure 3 for Universal and Transferable Adversarial Attacks on Aligned Language Models
Figure 4 for Universal and Transferable Adversarial Attacks on Aligned Language Models
Viaarxiv icon