Alert button
Picture for Mengzhou Xia

Mengzhou Xia

Alert button

What's in Your "Safe" Data?: Identifying Benign Data that Breaks Safety

Add code
Bookmark button
Alert button
Apr 01, 2024
Luxi He, Mengzhou Xia, Peter Henderson

Viaarxiv icon

LESS: Selecting Influential Data for Targeted Instruction Tuning

Add code
Bookmark button
Alert button
Feb 20, 2024
Mengzhou Xia, Sadhika Malladi, Suchin Gururangan, Sanjeev Arora, Danqi Chen

Viaarxiv icon

Language Models as Science Tutors

Add code
Bookmark button
Alert button
Feb 16, 2024
Alexis Chevalier, Jiayi Geng, Alexander Wettig, Howard Chen, Sebastian Mizera, Toni Annala, Max Jameson Aragon, Arturo Rodríguez Fanlo, Simon Frieder, Simon Machado, Akshara Prabhakar, Ellie Thieu, Jiachen T. Wang, Zirui Wang, Xindi Wu, Mengzhou Xia, Wenhan Jia, Jiatong Yu, Jun-Jie Zhu, Zhiyong Jason Ren, Sanjeev Arora, Danqi Chen

Viaarxiv icon

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Add code
Bookmark button
Alert button
Feb 07, 2024
Boyi Wei, Kaixuan Huang, Yangsibo Huang, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, Peter Henderson

Viaarxiv icon

Detecting Pretraining Data from Large Language Models

Add code
Bookmark button
Alert button
Nov 03, 2023
Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, Luke Zettlemoyer

Figure 1 for Detecting Pretraining Data from Large Language Models
Figure 2 for Detecting Pretraining Data from Large Language Models
Figure 3 for Detecting Pretraining Data from Large Language Models
Figure 4 for Detecting Pretraining Data from Large Language Models
Viaarxiv icon

Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation

Add code
Bookmark button
Alert button
Oct 10, 2023
Yangsibo Huang, Samyak Gupta, Mengzhou Xia, Kai Li, Danqi Chen

Figure 1 for Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
Figure 2 for Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
Figure 3 for Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
Figure 4 for Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
Viaarxiv icon

Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

Add code
Bookmark button
Alert button
Oct 10, 2023
Mengzhou Xia, Tianyu Gao, Zhiyuan Zeng, Danqi Chen

Viaarxiv icon

InstructEval: Systematic Evaluation of Instruction Selection Methods

Add code
Bookmark button
Alert button
Jul 16, 2023
Anirudh Ajith, Chris Pan, Mengzhou Xia, Ameet Deshpande, Karthik Narasimhan

Figure 1 for InstructEval: Systematic Evaluation of Instruction Selection Methods
Figure 2 for InstructEval: Systematic Evaluation of Instruction Selection Methods
Figure 3 for InstructEval: Systematic Evaluation of Instruction Selection Methods
Figure 4 for InstructEval: Systematic Evaluation of Instruction Selection Methods
Viaarxiv icon