Dialogue systems are increasingly integrated into mental health support to help clients facilitate exploration, gain insight, take action, and ultimately heal themselves. For a dialogue system to be practical and user-friendly, it should be client-centric, focusing on the client's behaviors. However, existing dialogue systems publicly available for mental health support often concentrate solely on the counselor's strategies rather than the behaviors expressed by clients. This can lead to the implementation of unreasonable or inappropriate counseling strategies and corresponding responses from the dialogue system. To address this issue, we propose PsyChat, a client-centric dialogue system that provides psychological support through online chat. The client-centric dialogue system comprises five modules: client behavior recognition, counselor strategy selection, input packer, response generator intentionally fine-tuned to produce responses, and response selection. Both automatic and human evaluations demonstrate the effectiveness and practicality of our proposed dialogue system for real-life mental health support. Furthermore, we employ our proposed dialogue system to simulate a real-world client-virtual-counselor interaction scenario. The system is capable of predicting the client's behaviors, selecting appropriate counselor strategies, and generating accurate and suitable responses, as demonstrated in the scenario.
As Large Language Models (LLMs) are becoming prevalent in various fields, there is an urgent need for improved NLP benchmarks that encompass all the necessary knowledge of individual discipline. Many contemporary benchmarks for foundational models emphasize a broad range of subjects but often fall short in presenting all the critical subjects and encompassing necessary professional knowledge of them. This shortfall has led to skewed results, given that LLMs exhibit varying performance across different subjects and knowledge areas. To address this issue, we present psybench, the first comprehensive Chinese evaluation suite that covers all the necessary knowledge required for graduate entrance exams. psybench offers a deep evaluation of a model's strengths and weaknesses in psychology through multiple-choice questions. Our findings show significant differences in performance across different sections of a subject, highlighting the risk of skewed results when the knowledge in test sets is not balanced. Notably, only the ChatGPT model reaches an average accuracy above $70\%$, indicating that there is still plenty of room for improvement. We expect that psybench will help to conduct thorough evaluations of base models' strengths and weaknesses and assist in practical application in the field of psychology.
NSFW (Not Safe for Work) content, in the context of a dialogue, can have severe side effects on users in open-domain dialogue systems. However, research on detecting NSFW language, especially sexually explicit content, within a dialogue context has significantly lagged behind. To address this issue, we introduce CensorChat, a dialogue monitoring dataset aimed at NSFW dialogue detection. Leveraging knowledge distillation techniques involving GPT-4 and ChatGPT, this dataset offers a cost-effective means of constructing NSFW content detectors. The process entails collecting real-life human-machine interaction data and breaking it down into single utterances and single-turn dialogues, with the chatbot delivering the final utterance. ChatGPT is employed to annotate unlabeled data, serving as a training set. Rationale validation and test sets are constructed using ChatGPT and GPT-4 as annotators, with a self-criticism strategy for resolving discrepancies in labeling. A BERT model is fine-tuned as a text classifier on pseudo-labeled data, and its performance is assessed. The study emphasizes the importance of AI systems prioritizing user safety and well-being in digital conversations while respecting freedom of expression. The proposed approach not only advances NSFW content detection but also aligns with evolving user protection needs in AI-driven dialogues.
Dialogue safety remains a pervasive challenge in open-domain human-machine interaction. Existing approaches propose distinctive dialogue safety taxonomies and datasets for detecting explicitly harmful responses. However, these taxonomies may not be suitable for analyzing response safety in mental health support. In real-world interactions, a model response deemed acceptable in casual conversations might have a negligible positive impact on users seeking mental health support. To address these limitations, this paper aims to develop a theoretically and factually grounded taxonomy that prioritizes the positive impact on help-seekers. Additionally, we create a benchmark corpus with fine-grained labels for each dialogue session to facilitate further research. We analyze the dataset using popular language models, including BERT-base, RoBERTa-large, and ChatGPT, to detect and understand unsafe responses within the context of mental health support. Our study reveals that ChatGPT struggles to detect safety categories with detailed safety definitions in a zero- and few-shot paradigm, whereas the fine-tuned model proves to be more suitable. The developed dataset and findings serve as valuable benchmarks for advancing research on dialogue safety in mental health support, with significant implications for improving the design and deployment of conversation agents in real-world applications. We release our code and data here: https://github.com/qiuhuachuan/DialogueSafety.
* accepted to The 12th CCF International Conference on Natural Language
Processing and Chinese Computing (NLPCC2023)
Researchers have invested considerable effort into ensuring that large language models (LLMs) align with human values, using various training techniques, such as instruction tuning and Reinforcement Learning from Human or AI Feedback (RLHF/RLAIF), to guard against text unsafety. However, these defenses remain incredibly vulnerable to some jailbreak attacks, which can cause the model to become overly defensive to sensitive topics or still generate harmful content, leaving the model performance particularly fragile. Therefore, to comprehensively study text safety and output robustness, we propose a latent jailbreak prompt dataset, each involving malicious instruction embedding. Specifically, we instruct the model to complete a regular task, such as translation, where the text to be translated contains malicious instructions. To further analyze the safety and robustness, we design a hierarchical annotation framework. We present a systematic analysis of the safety and robustness of LLMs concerning the position of explicit normal instructions, word replacement (verbs in explicit normal instructions, target groups in malicious instructions, cue words in malicious instructions), and instruction replacement (different explicit normal instructions). Our results show that current LLMs not only have a preference for certain instruction verbs, but also exhibit different jailbreak rates for different instruction verbs in explicit normal instructions. In other words, the probability of generating unsafe content by the model will be reinforced to varying degrees depending on the instruction verb in explicit normal instructions. Code and data are available at https://github.com/qiuhuachuan/latent-jailbreak.
Communication success relies heavily on reading participants' reactions. Such feedback is especially important for mental health counselors, who must carefully consider the client's progress and adjust their approach accordingly. However, previous NLP research on counseling has mainly focused on studying counselors' intervention strategies rather than their clients' reactions to the intervention. This work aims to fill this gap by developing a theoretically grounded annotation framework that encompasses counselors' strategies and client reaction behaviors. The framework has been tested against a large-scale, high-quality text-based counseling dataset we collected over the past two years from an online welfare counseling platform. Our study shows how clients react to counselors' strategies, how such reactions affect the final counseling outcomes, and how counselors can adjust their strategies in response to these reactions. We also demonstrate that this study can help counselors automatically predict their clients' states.
There has been an increasing research interest in developing specialized dialogue systems that can offer mental health support. However, gathering large-scale and real-life multi-turn conversations for mental health support poses challenges due to the sensitivity of personal information, as well as the time and cost involved. To address these issues, we introduce the SMILE approach, an inclusive language expansion technique that employs ChatGPT to extend public single-turn dialogues into multi-turn ones. Our research first presents a preliminary exploratory study that validates the effectiveness of the SMILE approach. Furthermore, we conduct a comprehensive and systematic contrastive analysis of datasets generated with and without the SMILE approach, demonstrating that the SMILE method results in a large-scale, diverse, and close-to-real-life multi-turn mental health support conversation corpus, including dialog topics, lexical and semantic features. Finally, we use the collected corpus (SMILECHAT) to develop a more effective dialogue system that offers emotional support and constructive suggestions in multi-turn conversations for mental health support.
As an essential component of dialogue systems, multi-turn response selection aims to pick out the optimal response among a set of candidates to improve the dialogue fluency. In this paper, we investigate three problems of current response selection approaches, especially for generation-based conversational agents: (i) Existing approaches are often formulated as a sentence scoring problem, which does not consider relationships between responses. (ii) Existing models tend to select undesirable candidates that have large overlaps with the dialogue history. (iii) Negative instances in training are mainly constructed by random sampling from the corpus, whereas generated candidates in practice typically have a closer distribution. To address the above problems, we create a new dataset called ConvAI2+ and propose a new response selector called Global-Selector. Experimental results show that Global-Selector trained on ConvAI2+ have noticeable improvements in both accuracy and inference speed.