Alert button
Picture for Ruoxi Jia

Ruoxi Jia

Alert button

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

Add code
Bookmark button
Alert button
Mar 19, 2024
Zhuowen Yuan, Zidi Xiong, Yi Zeng, Ning Yu, Ruoxi Jia, Dawn Song, Bo Li

Figure 1 for RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
Figure 2 for RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
Figure 3 for RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
Figure 4 for RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
Viaarxiv icon

Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study

Add code
Bookmark button
Alert button
Mar 15, 2024
Chenguang Wang, Ruoxi Jia, Xin Liu, Dawn Song

Figure 1 for Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study
Figure 2 for Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study
Figure 3 for Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study
Figure 4 for Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study
Viaarxiv icon

A Safe Harbor for AI Evaluation and Red Teaming

Add code
Bookmark button
Alert button
Mar 07, 2024
Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson

Figure 1 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 2 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 3 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 4 for A Safe Harbor for AI Evaluation and Red Teaming
Viaarxiv icon

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

Add code
Bookmark button
Alert button
Feb 14, 2024
Myeongseob Ko, Feiyang Kang, Weiyan Shi, Ming Jin, Zhou Yu, Ruoxi Jia

Viaarxiv icon

How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

Add code
Bookmark button
Alert button
Jan 23, 2024
Yi Zeng, Hongpeng Lin, Jingwen Zhang, Diyi Yang, Ruoxi Jia, Weiyan Shi

Viaarxiv icon

Efficient Data Shapley for Weighted Nearest Neighbor Algorithms

Add code
Bookmark button
Alert button
Jan 20, 2024
Jiachen T. Wang, Prateek Mittal, Ruoxi Jia

Viaarxiv icon

Data Acquisition: A New Frontier in Data-centric AI

Add code
Bookmark button
Alert button
Nov 22, 2023
Lingjiao Chen, Bilge Acun, Newsha Ardalani, Yifan Sun, Feiyang Kang, Hanrui Lyu, Yongchan Kwon, Ruoxi Jia, Carole-Jean Wu, Matei Zaharia, James Zou

Viaarxiv icon

Learning to Rank for Active Learning via Multi-Task Bilevel Optimization

Add code
Bookmark button
Alert button
Oct 25, 2023
Zixin Ding, Si Chen, Ruoxi Jia, Yuxin Chen

Viaarxiv icon

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Add code
Bookmark button
Alert button
Oct 05, 2023
Xiangyu Qi, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, Peter Henderson

Figure 1 for Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Figure 2 for Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Figure 3 for Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Figure 4 for Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Viaarxiv icon