Alert button
Picture for Xuanjing Huang

Xuanjing Huang

Alert button

Subspace Defense: Discarding Adversarial Perturbations by Learning a Subspace for Clean Signals

Add code
Bookmark button
Alert button
Mar 24, 2024
Rui Zheng, Yuhao Zhou, Zhiheng Xi, Tao Gui, Qi Zhang, Xuanjing Huang

Viaarxiv icon

Decoding Continuous Character-based Language from Non-invasive Brain Recordings

Add code
Bookmark button
Alert button
Mar 19, 2024
Cenyuan Zhang, Xiaoqing Zheng, Ruicheng Yin, Shujie Geng, Jianhan Xu, Xuan Gao, Changze Lv, Zixuan Ling, Xuanjing Huang, Miao Cao, Jianfeng Feng

Viaarxiv icon

EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models

Add code
Bookmark button
Alert button
Mar 18, 2024
Weikang Zhou, Xiao Wang, Limao Xiong, Han Xia, Yingshuang Gu, Mingxu Chai, Fukang Zhu, Caishuang Huang, Shihan Dou, Zhiheng Xi, Rui Zheng, Songyang Gao, Yicheng Zou, Hang Yan, Yifan Le, Ruohui Wang, Lijun Li, Jing Shao, Tao Gui, Qi Zhang, Xuanjing Huang

Figure 1 for EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Figure 2 for EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Figure 3 for EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Figure 4 for EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Viaarxiv icon

ALaRM: Align Language Models via Hierarchical Rewards Modeling

Add code
Bookmark button
Alert button
Mar 16, 2024
Yuhang Lai, Siyuan Wang, Shujun Liu, Xuanjing Huang, Zhongyu Wei

Figure 1 for ALaRM: Align Language Models via Hierarchical Rewards Modeling
Figure 2 for ALaRM: Align Language Models via Hierarchical Rewards Modeling
Figure 3 for ALaRM: Align Language Models via Hierarchical Rewards Modeling
Figure 4 for ALaRM: Align Language Models via Hierarchical Rewards Modeling
Viaarxiv icon

Debatrix: Multi-dimensinal Debate Judge with Iterative Chronological Analysis Based on LLM

Add code
Bookmark button
Alert button
Mar 12, 2024
Jingcong Liang, Rong Ye, Meng Han, Ruofei Lai, Xinyu Zhang, Xuanjing Huang, Zhongyu Wei

Figure 1 for Debatrix: Multi-dimensinal Debate Judge with Iterative Chronological Analysis Based on LLM
Figure 2 for Debatrix: Multi-dimensinal Debate Judge with Iterative Chronological Analysis Based on LLM
Figure 3 for Debatrix: Multi-dimensinal Debate Judge with Iterative Chronological Analysis Based on LLM
Figure 4 for Debatrix: Multi-dimensinal Debate Judge with Iterative Chronological Analysis Based on LLM
Viaarxiv icon

Advancing Parameter Efficiency in Fine-tuning via Representation Editing

Add code
Bookmark button
Alert button
Feb 28, 2024
Muling Wu, Wenhao Liu, Xiaohua Wang, Tianlong Li, Changze Lv, Zixuan Ling, Jianhao Zhu, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang

Viaarxiv icon

Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution

Add code
Bookmark button
Alert button
Feb 27, 2024
Nuo Xu, Jun Zhao, Can Zu, Sixian Li, Lu Chen, Zhihao Zhang, Rui Zheng, Shihan Dou, Wenjuan Qin, Tao Gui, Qi Zhang, Xuanjing Huang

Viaarxiv icon

CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models

Add code
Bookmark button
Alert button
Feb 26, 2024
Huijie Lv, Xiao Wang, Yuansen Zhang, Caishuang Huang, Shihan Dou, Junjie Ye, Tao Gui, Qi Zhang, Xuanjing Huang

Viaarxiv icon

On the Tip of the Tongue: Analyzing Conceptual Representation in Large Language Models with Reverse-Dictionary Probe

Add code
Bookmark button
Alert button
Feb 26, 2024
Ningyu Xu, Qi Zhang, Menghan Zhang, Peng Qian, Xuanjing Huang

Viaarxiv icon