Alert button
Picture for Dennis Wei

Dennis Wei

Alert button

Multi-Level Explanations for Generative Language Models

Mar 21, 2024
Lucas Monteiro Paes, Dennis Wei, Hyo Jin Do, Hendrik Strobelt, Ronny Luss, Amit Dhurandhar, Manish Nagireddy, Karthikeyan Natesan Ramamurthy, Prasanna Sattigeri, Werner Geyer, Soumya Ghosh

Viaarxiv icon

Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations

Mar 09, 2024
Swapnaja Achintalwar, Adriana Alvarado Garcia, Ateret Anaby-Tavor, Ioana Baldini, Sara E. Berger, Bishwaranjan Bhattacharjee, Djallel Bouneffouf, Subhajit Chaudhury, Pin-Yu Chen, Lamogha Chiazor, Elizabeth M. Daly, Rogério Abreu de Paula, Pierre Dognin, Eitan Farchi, Soumya Ghosh, Michael Hind, Raya Horesh, George Kour, Ja Young Lee, Erik Miehling, Keerthiram Murugesan, Manish Nagireddy, Inkit Padhi, David Piorkowski, Ambrish Rawat, Orna Raz, Prasanna Sattigeri, Hendrik Strobelt, Sarathkrishna Swaminathan, Christoph Tillmann, Aashka Trivedi, Kush R. Varshney, Dennis Wei, Shalisha Witherspooon, Marcel Zalmanovici

Viaarxiv icon

Causal Bandits with General Causal Models and Interventions

Mar 01, 2024
Zirui Yan, Dennis Wei, Dmitriy Katz-Rogozhnikov, Prasanna Sattigeri, Ali Tajer

Figure 1 for Causal Bandits with General Causal Models and Interventions
Figure 2 for Causal Bandits with General Causal Models and Interventions
Figure 3 for Causal Bandits with General Causal Models and Interventions
Figure 4 for Causal Bandits with General Causal Models and Interventions
Viaarxiv icon

Trust Regions for Explanations via Black-Box Probabilistic Certification

Feb 21, 2024
Amit Dhurandhar, Swagatam Haldar, Dennis Wei, Karthikeyan Natesan Ramamurthy

Viaarxiv icon

Effective Human-AI Teams via Learned Natural Language Rules and Onboarding

Nov 07, 2023
Hussein Mozannar, Jimin J Lee, Dennis Wei, Prasanna Sattigeri, Subhro Das, David Sontag

Viaarxiv icon

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

Oct 19, 2023
Chongyu Fan, Jiancheng Liu, Yihua Zhang, Dennis Wei, Eric Wong, Sijia Liu

Viaarxiv icon

Interpretable Differencing of Machine Learning Models

Jun 13, 2023
Swagatam Haldar, Diptikalyan Saha, Dennis Wei, Rahul Nair, Elizabeth M. Daly

Figure 1 for Interpretable Differencing of Machine Learning Models
Figure 2 for Interpretable Differencing of Machine Learning Models
Figure 3 for Interpretable Differencing of Machine Learning Models
Figure 4 for Interpretable Differencing of Machine Learning Models
Viaarxiv icon

Convex Bounds on the Softmax Function with Applications to Robustness Verification

Mar 03, 2023
Dennis Wei, Haoze Wu, Min Wu, Pin-Yu Chen, Clark Barrett, Eitan Farchi

Figure 1 for Convex Bounds on the Softmax Function with Applications to Robustness Verification
Figure 2 for Convex Bounds on the Softmax Function with Applications to Robustness Verification
Figure 3 for Convex Bounds on the Softmax Function with Applications to Robustness Verification
Figure 4 for Convex Bounds on the Softmax Function with Applications to Robustness Verification
Viaarxiv icon

Who Should Predict? Exact Algorithms For Learning to Defer to Humans

Jan 15, 2023
Hussein Mozannar, Hunter Lang, Dennis Wei, Prasanna Sattigeri, Subhro Das, David Sontag

Figure 1 for Who Should Predict? Exact Algorithms For Learning to Defer to Humans
Figure 2 for Who Should Predict? Exact Algorithms For Learning to Defer to Humans
Figure 3 for Who Should Predict? Exact Algorithms For Learning to Defer to Humans
Figure 4 for Who Should Predict? Exact Algorithms For Learning to Defer to Humans
Viaarxiv icon