Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qing Zhao

School of Physics, Beijing Institute of Technology, China, Beijing Academy of Quantum Information Sciences, China

MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media

Oct 14, 2024

Wei Zhai, Nan Bai, Qing Zhao, Jianqiang Li, Fan Wang, Hongzhi Qi, Meng Jiang, Xiaoqin Wang, Bing Xiang Yang, Guanghui Fu

Figure 1 for MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media

Figure 2 for MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media

Figure 3 for MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media

Figure 4 for MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media

Abstract:As the prevalence of mental health challenges, social media has emerged as a key platform for individuals to express their emotions.Deep learning tends to be a promising solution for analyzing mental health on social media. However, black box models are often inflexible when switching between tasks, and their results typically lack explanations. With the rise of large language models (LLMs), their flexibility has introduced new approaches to the field. Also due to the generative nature, they can be prompted to explain decision-making processes. However, their performance on complex psychological analysis still lags behind deep learning. In this paper, we introduce the first multi-task Chinese Social Media Interpretable Mental Health Instructions (C-IMHI) dataset, consisting of 9K samples, which has been quality-controlled and manually validated. We also propose MentalGLM series models, the first open-source LLMs designed for explainable mental health analysis targeting Chinese social media, trained on a corpus of 50K instructions. The proposed models were evaluated on three downstream tasks and achieved better or comparable performance compared to deep learning models, generalized LLMs, and task fine-tuned LLMs. We validated a portion of the generated decision explanations with experts, showing promising results. We also evaluated the proposed models on a clinical dataset, where they outperformed other LLMs, indicating their potential applicability in the clinical field. Our models show strong performance, validated across tasks and perspectives. The decision explanations enhance usability and facilitate better understanding and practical application of the models. Both the constructed dataset and the models are publicly available via: https://github.com/zwzzzQAQ/MentalGLM.

Via

Access Paper or Ask Questions

Deep Learning and Large Language Models for Audio and Text Analysis in Predicting Suicidal Acts in Chinese Psychological Support Hotlines

Sep 10, 2024

Yining Chen, Jianqiang Li, Changwei Song, Qing Zhao, Yongsheng Tong, Guanghui Fu

Figure 1 for Deep Learning and Large Language Models for Audio and Text Analysis in Predicting Suicidal Acts in Chinese Psychological Support Hotlines

Figure 2 for Deep Learning and Large Language Models for Audio and Text Analysis in Predicting Suicidal Acts in Chinese Psychological Support Hotlines

Figure 3 for Deep Learning and Large Language Models for Audio and Text Analysis in Predicting Suicidal Acts in Chinese Psychological Support Hotlines

Figure 4 for Deep Learning and Large Language Models for Audio and Text Analysis in Predicting Suicidal Acts in Chinese Psychological Support Hotlines

Abstract:Suicide is a pressing global issue, demanding urgent and effective preventive interventions. Among the various strategies in place, psychological support hotlines had proved as a potent intervention method. Approximately two million people in China attempt suicide annually, with many individuals making multiple attempts. Prompt identification and intervention for high-risk individuals are crucial to preventing tragedies. With the rapid advancement of artificial intelligence (AI), especially the development of large-scale language models (LLMs), new technological tools have been introduced to the field of mental health. This study included 1284 subjects, and was designed to validate whether deep learning models and LLMs, using audio and transcribed text from support hotlines, can effectively predict suicide risk. We proposed a simple LLM-based pipeline that first summarizes transcribed text from approximately one hour of speech to extract key features, and then predict suicidial bahaviours in the future. We compared our LLM-based method with the traditional manual scale approach in a clinical setting and with five advanced deep learning models. Surprisingly, the proposed simple LLM pipeline achieved strong performance on a test set of 46 subjects, with an F1 score of 76\% when combined with manual scale rating. This is 7\% higher than the best speech-based deep learning models and represents a 27.82\% point improvement in F1 score compared to using the manual scale apporach alone. Our study explores new applications of LLMs and demonstrates their potential for future use in suicide prevention efforts.

Via

Access Paper or Ask Questions

An Exploratory Deep Learning Approach for Predicting Subsequent Suicidal Acts in Chinese Psychological Support Hotlines

Aug 29, 2024

Changwei Song, Qing Zhao, Jianqiang Li, Yining Chen, Yongsheng Tong, Guanghui Fu

Figure 1 for An Exploratory Deep Learning Approach for Predicting Subsequent Suicidal Acts in Chinese Psychological Support Hotlines

Figure 2 for An Exploratory Deep Learning Approach for Predicting Subsequent Suicidal Acts in Chinese Psychological Support Hotlines

Figure 3 for An Exploratory Deep Learning Approach for Predicting Subsequent Suicidal Acts in Chinese Psychological Support Hotlines

Figure 4 for An Exploratory Deep Learning Approach for Predicting Subsequent Suicidal Acts in Chinese Psychological Support Hotlines

Abstract:Psychological support hotlines are an effective suicide prevention measure that typically relies on professionals using suicide risk assessment scales to predict individual risk scores. However, the accuracy of scale-based predictive methods for suicide risk assessment can vary widely depending on the expertise of the operator. This limitation underscores the need for more reliable methods, prompting this research's innovative exploration of the use of artificial intelligence to improve the accuracy and efficiency of suicide risk prediction within the context of psychological support hotlines. The study included data from 1,549 subjects from 2015-2017 in China who contacted a psychological support hotline. Each participant was followed for 12 months to identify instances of suicidal behavior. We proposed a novel multi-task learning method that uses the large-scale pre-trained model Whisper for feature extraction and fits psychological scales while predicting the risk of suicide. The proposed method yields a 2.4\% points improvement in F1-score compared to the traditional manual approach based on the psychological scales. Our model demonstrated superior performance compared to the other eight popular models. To our knowledge, this study is the first to apply deep learning to long-term speech data to predict suicide risk in China, indicating grate potential for clinical applications. The source code is publicly available at: \url{https://github.com/songchangwei/Suicide-Risk-Prediction}.

Via

Access Paper or Ask Questions

A Generic Review of Integrating Artificial Intelligence in Cognitive Behavioral Therapy

Jul 28, 2024

Meng Jiang, Qing Zhao, Jianqiang Li, Fan Wang, Tianyu He, Xinyan Cheng, Bing Xiang Yang, Grace W. K. Ho, Guanghui Fu

Abstract:Cognitive Behavioral Therapy (CBT) is a well-established intervention for mitigating psychological issues by modifying maladaptive cognitive and behavioral patterns. However, delivery of CBT is often constrained by resource limitations and barriers to access. Advancements in artificial intelligence (AI) have provided technical support for the digital transformation of CBT. Particularly, the emergence of pre-training models (PTMs) and large language models (LLMs) holds immense potential to support, augment, optimize and automate CBT delivery. This paper reviews the literature on integrating AI into CBT interventions. We begin with an overview of CBT. Then, we introduce the integration of AI into CBT across various stages: pre-treatment, therapeutic process, and post-treatment. Next, we summarized the datasets relevant to some CBT-related tasks. Finally, we discuss the benefits and current limitations of applying AI to CBT. We suggest key areas for future research, highlighting the need for further exploration and validation of the long-term efficacy and clinical utility of AI-enhanced CBT. The transformative potential of AI in reshaping the practice of CBT heralds a new era of more accessible, efficient, and personalized mental health interventions.

Via

Access Paper or Ask Questions

An Error Discovery and Correction for the Family of V-Shaped BPSO Algorithms

Jul 25, 2024

Qing Zhao, Chengkui Zhang, Hao Li, Ting Ke

Abstract:BPSO algorithm is a swarm intelligence optimization algorithm, which has the characteristics of good optimization effect, high efficiency and easy to implement. In recent years, it has been used to optimize a variety of machine learning and deep learning models, such as CNN, LSTM, SVM, etc. But it is easy to fall into local optimum for the lack of exploitation ability. It is found that in the article, which is different from previous studies, The reason for the poor performance is an error existing in their velocity update function, which leads to abnormal and chaotic behavior of particles. This not only makes the algorithm difficult to converge, but also often searches the repeated space. So, traditionally, it has to rely on a low w value in the later stage to force these algorithms to converge, but also makes them quickly lose their search ability and prone to getting trapped in local optima. This article proposes a velocity legacy term correction method for all V-shaped BPSOs. Experimentals based on 0/1 knapsack problems show that it has a significant effect on accuracy and efficiency for all of the 4 commonly used V-Shaped BPSOs. Therefore it is an significant breakthrough in the field of swarm intelligence.

* 25 pages, 11 figures

Via

Access Paper or Ask Questions

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction

Jul 23, 2024

Haoran Wang, Xinji Mai, Zeng Tao, Yan Wang, Jiawen Yu, Ziheng Zhou, Xuan Tong, Shaoqi Yan, Qing Zhao, Shuyong Gao(+1 more)

Figure 1 for Hi-EF: Benchmarking Emotion Forecasting in Human-interaction

Figure 2 for Hi-EF: Benchmarking Emotion Forecasting in Human-interaction

Figure 3 for Hi-EF: Benchmarking Emotion Forecasting in Human-interaction

Figure 4 for Hi-EF: Benchmarking Emotion Forecasting in Human-interaction

Abstract:Affective Forecasting, a research direction in psychology that predicts individuals future emotions, is often constrained by numerous external factors like social influence and temporal distance. To address this, we transform Affective Forecasting into a Deep Learning problem by designing an Emotion Forecasting paradigm based on two-party interactions. We propose a novel Emotion Forecasting (EF) task grounded in the theory that an individuals emotions are easily influenced by the emotions or other information conveyed during interactions with another person. To tackle this task, we have developed a specialized dataset, Human-interaction-based Emotion Forecasting (Hi-EF), which contains 3069 two-party Multilayered-Contextual Interaction Samples (MCIS) with abundant affective-relevant labels and three modalities. Hi-EF not only demonstrates the feasibility of the EF task but also highlights its potential. Additionally, we propose a methodology that establishes a foundational and referential baseline model for the EF task and extensive experiments are provided. The dataset and code is available at https://github.com/Anonymize-Author/Hi-EF.

Via

Access Paper or Ask Questions

All rivers run into the sea: Unified Modality Brain-like Emotional Central Mechanism

Jul 22, 2024

Xinji Mai, Junxiong Lin, Haoran Wang, Zeng Tao, Yan Wang, Shaoqi Yan, Xuan Tong, Jiawen Yu, Boyang Wang, Ziheng Zhou(+3 more)

Figure 1 for All rivers run into the sea: Unified Modality Brain-like Emotional Central Mechanism

Figure 2 for All rivers run into the sea: Unified Modality Brain-like Emotional Central Mechanism

Figure 3 for All rivers run into the sea: Unified Modality Brain-like Emotional Central Mechanism

Figure 4 for All rivers run into the sea: Unified Modality Brain-like Emotional Central Mechanism

Abstract:In the field of affective computing, fully leveraging information from a variety of sensory modalities is essential for the comprehensive understanding and processing of human emotions. Inspired by the process through which the human brain handles emotions and the theory of cross-modal plasticity, we propose UMBEnet, a brain-like unified modal affective processing network. The primary design of UMBEnet includes a Dual-Stream (DS) structure that fuses inherent prompts with a Prompt Pool and a Sparse Feature Fusion (SFF) module. The design of the Prompt Pool is aimed at integrating information from different modalities, while inherent prompts are intended to enhance the system's predictive guidance capabilities and effectively manage knowledge related to emotion classification. Moreover, considering the sparsity of effective information across different modalities, the SSF module aims to make full use of all available sensory data through the sparse integration of modality fusion prompts and inherent prompts, maintaining high adaptability and sensitivity to complex emotional states. Extensive experiments on the largest benchmark datasets in the Dynamic Facial Expression Recognition (DFER) field, including DFEW, FERV39k, and MAFW, have proven that UMBEnet consistently outperforms the current state-of-the-art methods. Notably, in scenarios of Modality Missingness and multimodal contexts, UMBEnet significantly surpasses the leading current methods, demonstrating outstanding performance and adaptability in tasks that involve complex emotional understanding with rich multimodal information.

Via

Access Paper or Ask Questions

Embedded Prompt Tuning: Towards Enhanced Calibration of Pretrained Models for Medical Images

Jul 02, 2024

Wenqiang Zu, Shenghao Xie, Qing Zhao, Guoqi Li, Lei Ma

Figure 1 for Embedded Prompt Tuning: Towards Enhanced Calibration of Pretrained Models for Medical Images

Figure 2 for Embedded Prompt Tuning: Towards Enhanced Calibration of Pretrained Models for Medical Images

Figure 3 for Embedded Prompt Tuning: Towards Enhanced Calibration of Pretrained Models for Medical Images

Figure 4 for Embedded Prompt Tuning: Towards Enhanced Calibration of Pretrained Models for Medical Images

Abstract:Foundation models pre-trained on large-scale data have been widely witnessed to achieve success in various natural imaging downstream tasks. Parameter-efficient fine-tuning (PEFT) methods aim to adapt foundation models to new domains by updating only a small portion of parameters in order to reduce computational overhead. However, the effectiveness of these PEFT methods, especially in cross-domain few-shot scenarios, e.g., medical image analysis, has not been fully explored. In this work, we facilitate the study of the performance of PEFT when adapting foundation models to medical image classification tasks. Furthermore, to alleviate the limitations of prompt introducing ways and approximation capabilities on Transformer architectures of mainstream prompt tuning methods, we propose the Embedded Prompt Tuning (EPT) method by embedding prompt tokens into the expanded channels. We also find that there are anomalies in the feature space distribution of foundation models during pre-training process, and prompt tuning can help mitigate this negative impact. To explain this phenomenon, we also introduce a novel perspective to understand prompt tuning: Prompt tuning is a distribution calibrator. And we support it by analyzing patch-wise scaling and feature separation operations contained in EPT. Our experiments show that EPT outperforms several state-of-the-art fine-tuning methods by a significant margin on few-shot medical image classification tasks, and completes the fine-tuning process within highly competitive time, indicating EPT is an effective PEFT method. The source code is available at github.com/zuwenqiang/EPT.

* 16 pages, 7 figures. arXiv admin note: text overlap with arXiv:2306.09579, arXiv:2203.12119 by other authors

Via

Access Paper or Ask Questions

Suppressing Uncertainties in Degradation Estimation for Blind Super-Resolution

Jun 24, 2024

Junxiong Lin, Zeng Tao, Xuan Tong, Xinji Mai, Haoran Wang, Boyang Wang, Yan Wang, Qing Zhao, Jiawen Yu, Yuxuan Lin(+3 more)

Figure 1 for Suppressing Uncertainties in Degradation Estimation for Blind Super-Resolution

Figure 2 for Suppressing Uncertainties in Degradation Estimation for Blind Super-Resolution

Figure 3 for Suppressing Uncertainties in Degradation Estimation for Blind Super-Resolution

Figure 4 for Suppressing Uncertainties in Degradation Estimation for Blind Super-Resolution

Abstract:The problem of blind image super-resolution aims to recover high-resolution (HR) images from low-resolution (LR) images with unknown degradation modes. Most existing methods model the image degradation process using blur kernels. However, this explicit modeling approach struggles to cover the complex and varied degradation processes encountered in the real world, such as high-order combinations of JPEG compression, blur, and noise. Implicit modeling for the degradation process can effectively overcome this issue, but a key challenge of implicit modeling is the lack of accurate ground truth labels for the degradation process to conduct supervised training. To overcome this limitations inherent in implicit modeling, we propose an \textbf{U}ncertainty-based degradation representation for blind \textbf{S}uper-\textbf{R}esolution framework (\textbf{USR}). By suppressing the uncertainty of local degradation representations in images, USR facilitated self-supervised learning of degradation representations. The USR consists of two components: Adaptive Uncertainty-Aware Degradation Extraction (AUDE) and a feature extraction network composed of Variable Depth Dynamic Convolution (VDDC) blocks. To extract Uncertainty-based Degradation Representation from LR images, the AUDE utilizes the Self-supervised Uncertainty Contrast module with Uncertainty Suppression Loss to suppress the inherent model uncertainty of the Degradation Extractor. Furthermore, VDDC block integrates degradation information through dynamic convolution. Rhe VDDC also employs an Adaptive Intensity Scaling operation that adaptively adjusts the degradation representation according to the network hierarchy, thereby facilitating the effective integration of degradation information. Quantitative and qualitative experiments affirm the superiority of our approach.

Via

Access Paper or Ask Questions

Seeking Certainty In Uncertainty: Dual-Stage Unified Framework Solving Uncertainty in Dynamic Facial Expression Recognition

Jun 24, 2024

Haoran Wang, Xinji Mai, Zeng Tao, Xuan Tong, Junxiong Lin, Yan Wang, Jiawen Yu, Boyang Wang, Shaoqi Yan, Qing Zhao(+3 more)

Figure 1 for Seeking Certainty In Uncertainty: Dual-Stage Unified Framework Solving Uncertainty in Dynamic Facial Expression Recognition

Figure 2 for Seeking Certainty In Uncertainty: Dual-Stage Unified Framework Solving Uncertainty in Dynamic Facial Expression Recognition

Figure 3 for Seeking Certainty In Uncertainty: Dual-Stage Unified Framework Solving Uncertainty in Dynamic Facial Expression Recognition

Figure 4 for Seeking Certainty In Uncertainty: Dual-Stage Unified Framework Solving Uncertainty in Dynamic Facial Expression Recognition

Abstract:The contemporary state-of-the-art of Dynamic Facial Expression Recognition (DFER) technology facilitates remarkable progress by deriving emotional mappings of facial expressions from video content, underpinned by training on voluminous datasets. Yet, the DFER datasets encompass a substantial volume of noise data. Noise arises from low-quality captures that defy logical labeling, and instances that suffer from mislabeling due to annotation bias, engendering two principal types of uncertainty: the uncertainty regarding data usability and the uncertainty concerning label reliability. Addressing the two types of uncertainty, we have meticulously crafted a two-stage framework aiming at \textbf{S}eeking \textbf{C}ertain data \textbf{I}n extensive \textbf{U}ncertain data (SCIU). This initiative aims to purge the DFER datasets of these uncertainties, thereby ensuring that only clean, verified data is employed in training processes. To mitigate the issue of low-quality samples, we introduce the Coarse-Grained Pruning (CGP) stage, which assesses sample weights and prunes those deemed unusable due to their low weight. For samples with incorrect annotations, the Fine-Grained Correction (FGC) stage evaluates prediction stability to rectify mislabeled data. Moreover, SCIU is conceived as a universally compatible, plug-and-play framework, tailored to integrate seamlessly with prevailing DFER methodologies. Rigorous experiments across prevalent DFER datasets and against numerous benchmark methods substantiates SCIU's capacity to markedly elevate performance metrics.

Via

Access Paper or Ask Questions