Alert button
Picture for Zhexin Zhang

Zhexin Zhang

Alert button

ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors

Add code
Bookmark button
Alert button
Feb 26, 2024
Zhexin Zhang, Yida Lu, Jingyuan Ma, Di Zhang, Rui Li, Pei Ke, Hao Sun, Lei Sha, Zhifang Sui, Hongning Wang, Minlie Huang

Viaarxiv icon

Unveiling the Implicit Toxicity in Large Language Models

Add code
Bookmark button
Alert button
Nov 29, 2023
Jiaxin Wen, Pei Ke, Hao Sun, Zhexin Zhang, Chengfei Li, Jinfeng Bai, Minlie Huang

Viaarxiv icon

Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization

Add code
Bookmark button
Alert button
Nov 15, 2023
Zhexin Zhang, Junxiao Yang, Pei Ke, Minlie Huang

Viaarxiv icon

SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions

Add code
Bookmark button
Alert button
Sep 13, 2023
Zhexin Zhang, Leqi Lei, Lindong Wu, Rui Sun, Yongkang Huang, Chong Long, Xiao Liu, Xuanyu Lei, Jie Tang, Minlie Huang

Figure 1 for SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions
Figure 2 for SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions
Figure 3 for SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions
Figure 4 for SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions
Viaarxiv icon

Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation

Add code
Bookmark button
Alert button
Jul 10, 2023
Zhexin Zhang, Jiaxin Wen, Minlie Huang

Figure 1 for Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
Figure 2 for Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
Figure 3 for Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
Figure 4 for Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
Viaarxiv icon

Safety Assessment of Chinese Large Language Models

Add code
Bookmark button
Alert button
Apr 20, 2023
Hao Sun, Zhexin Zhang, Jiawen Deng, Jiale Cheng, Minlie Huang

Figure 1 for Safety Assessment of Chinese Large Language Models
Figure 2 for Safety Assessment of Chinese Large Language Models
Figure 3 for Safety Assessment of Chinese Large Language Models
Figure 4 for Safety Assessment of Chinese Large Language Models
Viaarxiv icon

Recent Advances towards Safe, Responsible, and Moral Dialogue Systems: A Survey

Add code
Bookmark button
Alert button
Feb 18, 2023
Jiawen Deng, Hao Sun, Zhexin Zhang, Jiale Cheng, Minlie Huang

Figure 1 for Recent Advances towards Safe, Responsible, and Moral Dialogue Systems: A Survey
Figure 2 for Recent Advances towards Safe, Responsible, and Moral Dialogue Systems: A Survey
Figure 3 for Recent Advances towards Safe, Responsible, and Moral Dialogue Systems: A Survey
Figure 4 for Recent Advances towards Safe, Responsible, and Moral Dialogue Systems: A Survey
Viaarxiv icon

MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Constructing Moral Discussions

Add code
Bookmark button
Alert button
Dec 21, 2022
Hao Sun, Zhexin Zhang, Fei Mi, Yasheng Wang, Wei Liu, Jianwei Cui, Bin Wang, Qun Liu, Minlie Huang

Figure 1 for MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Constructing Moral Discussions
Figure 2 for MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Constructing Moral Discussions
Figure 3 for MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Constructing Moral Discussions
Figure 4 for MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Constructing Moral Discussions
Viaarxiv icon

Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation

Add code
Bookmark button
Alert button
Dec 04, 2022
Zhexin Zhang, Jiale Cheng, Hao Sun, Jiawen Deng, Fei Mi, Yasheng Wang, Lifeng Shang, Minlie Huang

Figure 1 for Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation
Figure 2 for Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation
Figure 3 for Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation
Figure 4 for Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation
Viaarxiv icon

Selecting Stickers in Open-Domain Dialogue through Multitask Learning

Add code
Bookmark button
Alert button
Sep 16, 2022
Zhexin Zhang, Yeshuang Zhu, Zhengcong Fei, Jinchao Zhang, Jie Zhou

Figure 1 for Selecting Stickers in Open-Domain Dialogue through Multitask Learning
Figure 2 for Selecting Stickers in Open-Domain Dialogue through Multitask Learning
Figure 3 for Selecting Stickers in Open-Domain Dialogue through Multitask Learning
Figure 4 for Selecting Stickers in Open-Domain Dialogue through Multitask Learning
Viaarxiv icon