Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yang Lu

Toward Corpus Size Requirements for Training and Evaluating Depression Risk Models Using Spoken Language

Dec 31, 2024

Tomek Rutowski, Amir Harati, Elizabeth Shriberg, Yang Lu, Piotr Chlebek, Ricardo Oliveira

Figure 1 for Toward Corpus Size Requirements for Training and Evaluating Depression Risk Models Using Spoken Language

Figure 2 for Toward Corpus Size Requirements for Training and Evaluating Depression Risk Models Using Spoken Language

Figure 3 for Toward Corpus Size Requirements for Training and Evaluating Depression Risk Models Using Spoken Language

Abstract:Mental health risk prediction is a growing field in the speech community, but many studies are based on small corpora. This study illustrates how variations in test and train set sizes impact performance in a controlled study. Using a corpus of over 65K labeled data points, results from a fully crossed design of different train/test size combinations are provided. Two model types are included: one based on language and the other on speech acoustics. Both use methods current in this domain. An age-mismatched test set was also included. Results show that (1) test sizes below 1K samples gave noisy results, even for larger training set sizes; (2) training set sizes of at least 2K were needed for stable results; (3) NLP and acoustic models behaved similarly with train/test size variations, and (4) the mismatched test set showed the same patterns as the matched test set. Additional factors are discussed, including label priors, model strength and pre-training, unique speakers, and data lengths. While no single study can specify exact size requirements, results demonstrate the need for appropriately sized train and test sets for future studies of mental health risk prediction from speech and language.

* Proceedings Interspeech, 2022

Via

Access Paper or Ask Questions

Depression and Anxiety Prediction Using Deep Language Models and Transfer Learning

Dec 30, 2024

Tomasz Rutowski, Elizabeth Shriberg, Amir Harati, Yang Lu, Piotr Chlebek, Ricardo Oliveira

Abstract:Digital screening and monitoring applications can aid providers in the management of behavioral health conditions. We explore deep language models for detecting depression, anxiety, and their co-occurrence from conversational speech collected during 16k user interactions with an application. Labels come from PHQ-8 and GAD-7 results also collected by the application. We find that results for binary classification range from 0.86 to 0.79 AUC, depending on condition and co-occurrence. Best performance is achieved when a user has either both or neither condition, and we show that this result is not attributable to data skew. Finally, we find evidence suggesting that underlying word sequence cues may be more salient for depression than for anxiety.

* Proceedings IEEE International Conference on Behavioural and Social Computing (BESC), 2020

Via

Access Paper or Ask Questions

Asynchronous Federated Clustering with Unknown Number of Clusters

Dec 29, 2024

Yunfan Zhang, Yiqun Zhang, Yang Lu, Mengke Li, Xi Chen, Yiu-ming Cheung

Abstract:Federated Clustering (FC) is crucial to mining knowledge from unlabeled non-Independent Identically Distributed (non-IID) data provided by multiple clients while preserving their privacy. Most existing attempts learn cluster distributions at local clients, and then securely pass the desensitized information to the server for aggregation. However, some tricky but common FC problems are still relatively unexplored, including the heterogeneity in terms of clients' communication capacity and the unknown number of proper clusters $k^*$. To further bridge the gap between FC and real application scenarios, this paper first shows that the clients' communication asynchrony and unknown $k^*$ are complex coupling problems, and then proposes an Asynchronous Federated Cluster Learning (AFCL) method accordingly. It spreads the excessive number of seed points to the clients as a learning medium and coordinates them across the clients to form a consensus. To alleviate the distribution imbalance cumulated due to the unforeseen asynchronous uploading from the heterogeneous clients, we also design a balancing mechanism for seeds updating. As a result, the seeds gradually adapt to each other to reveal a proper number of clusters. Extensive experiments demonstrate the efficacy of AFCL.

* Accepted by AAAI 2025

Via

Access Paper or Ask Questions

Cross-Demographic Portability of Deep NLP-Based Depression Models

Dec 26, 2024

Tomek Rutowski, Elizabeth Shriberg, Amir Harati, Yang Lu, Ricardo Oliveira, Piotr Chlebek

Figure 1 for Cross-Demographic Portability of Deep NLP-Based Depression Models

Figure 2 for Cross-Demographic Portability of Deep NLP-Based Depression Models

Figure 3 for Cross-Demographic Portability of Deep NLP-Based Depression Models

Figure 4 for Cross-Demographic Portability of Deep NLP-Based Depression Models

Abstract:Deep learning models are rapidly gaining interest for real-world applications in behavioral health. An important gap in current literature is how well such models generalize over different populations. We study Natural Language Processing (NLP) based models to explore portability over two different corpora highly mismatched in age. The first and larger corpus contains younger speakers. It is used to train an NLP model to predict depression. When testing on unseen speakers from the same age distribution, this model performs at AUC=0.82. We then test this model on the second corpus, which comprises seniors from a retirement community. Despite the large demographic differences in the two corpora, we saw only modest degradation in performance for the senior-corpus data, achieving AUC=0.76. Interestingly, in the senior population, we find AUC=0.81 for the subset of patients whose health state is consistent over time. Implications for demographic portability of speech-based applications are discussed.

* Proceedings IEEE Spoken Language Technology Workshop (SLT), 2021

Via

Access Paper or Ask Questions

Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large Corpus

Dec 22, 2024

Amir Harati, Elizabeth Shriberg, Tomasz Rutowski, Piotr Chlebek, Yang Lu, Ricardo Oliveira

Figure 1 for Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large Corpus

Figure 2 for Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large Corpus

Figure 3 for Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large Corpus

Figure 4 for Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large Corpus

Abstract:Speech-based algorithms have gained interest for the management of behavioral health conditions such as depression. We explore a speech-based transfer learning approach that uses a lightweight encoder and that transfers only the encoder weights, enabling a simplified run-time model. Our study uses a large data set containing roughly two orders of magnitude more speakers and sessions than used in prior work. The large data set enables reliable estimation of improvement from transfer learning. Results for the prediction of PHQ-8 labels show up to 27% relative performance gains for binary classification; these gains are statistically significant with a p-value close to zero. Improvements were also found for regression. Additionally, the gain from transfer learning does not appear to require strong source task performance. Results suggest that this approach is flexible and offers promise for efficient implementation.

* Proceedings IEEE ICASSP 2021

Via

Access Paper or Ask Questions

LogBabylon: A Unified Framework for Cross-Log File Integration and Analysis

Dec 16, 2024

Rabimba Karanjai, Yang Lu, Dana Alsagheer, Keshav Kasichainula, Lei Xu, Weidong Shi, Shou-Hsuan Stephen Huang

Abstract:Logs are critical resources that record events, activities, or messages produced by software applications, operating systems, servers, and network devices. However, consolidating the heterogeneous logs and cross-referencing them is challenging and complicated. Manually analyzing the log data is time-consuming and prone to errors. LogBabylon is a centralized log data consolidating solution that leverages Large Language Models (LLMs) integrated with Retrieval-Augmented Generation (RAG) technology. LogBabylon interprets the log data in a human-readable way and adds insight analysis of the system performance and anomaly alerts. It provides a paramount view of the system landscape, enabling proactive management and rapid incident response. LogBabylon consolidates diverse log sources and enhances the extracted information's accuracy and relevancy. This facilitates a deeper understanding of log data, supporting more effective decision-making and operational efficiency. Furthermore, LogBabylon streamlines the log analysis process, significantly reducing the time and effort required to interpret complex datasets. Its capabilities extend to generating context-aware insights, offering an invaluable tool for continuous monitoring, performance optimization, and security assurance in dynamic computing environments.

Via

Access Paper or Ask Questions

Attributed Graph Clustering via Generalized Quaternion Representation Learning

Nov 22, 2024

Junyang Chen, Yiqun Zhang, Mengke Li, Yang Lu, Yiu-ming Cheung

Figure 1 for Attributed Graph Clustering via Generalized Quaternion Representation Learning

Figure 2 for Attributed Graph Clustering via Generalized Quaternion Representation Learning

Figure 3 for Attributed Graph Clustering via Generalized Quaternion Representation Learning

Figure 4 for Attributed Graph Clustering via Generalized Quaternion Representation Learning

Abstract:Clustering complex data in the form of attributed graphs has attracted increasing attention, where appropriate graph representation is a critical prerequisite for accurate cluster analysis. However, the Graph Convolutional Network will homogenize the representation of graph nodes due to the well-known over-smoothing effect. This limits the network architecture to a shallow one, losing the ability to capture the critical global distribution information for clustering. Therefore, we propose a generalized graph auto-encoder network, which introduces quaternion operations to the encoders to achieve efficient structured feature representation learning without incurring deeper network and larger-scale parameters. The generalization of our method lies in the following two aspects: 1) connecting the quaternion operation naturally suitable for four feature components with graph data of arbitrary attribute dimensions, and 2) introducing a generalized graph clustering objective as a loss term to obtain clustering-friendly representations without requiring a pre-specified number of clusters $k$. It turns out that the representations of nodes learned by the proposed Graph Clustering based on Generalized Quaternion representation learning (GCGQ) are more discriminative, containing global distribution information, and are more general, suiting downstream clustering under different $k$s. Extensive experiments including significance tests, ablation studies, and qualitative results, illustrate the superiority of GCGQ. The source code is temporarily opened at \url{https://anonymous.4open.science/r/ICLR-25-No7181-codes}.

Via

Access Paper or Ask Questions

Improving Visual Prompt Tuning by Gaussian Neighborhood Minimization for Long-Tailed Visual Recognition

Oct 28, 2024

Mengke Li, Ye Liu, Yang Lu, Yiqun Zhang, Yiu-ming Cheung, Hui Huang

Figure 1 for Improving Visual Prompt Tuning by Gaussian Neighborhood Minimization for Long-Tailed Visual Recognition

Figure 2 for Improving Visual Prompt Tuning by Gaussian Neighborhood Minimization for Long-Tailed Visual Recognition

Figure 3 for Improving Visual Prompt Tuning by Gaussian Neighborhood Minimization for Long-Tailed Visual Recognition

Figure 4 for Improving Visual Prompt Tuning by Gaussian Neighborhood Minimization for Long-Tailed Visual Recognition

Abstract:Long-tail learning has garnered widespread attention and achieved significant progress in recent times. However, even with pre-trained prior knowledge, models still exhibit weaker generalization performance on tail classes. The promising Sharpness-Aware Minimization (SAM) can effectively improve the generalization capability of models by seeking out flat minima in the loss landscape, which, however, comes at the cost of doubling the computational time. Since the update rule of SAM necessitates two consecutive (non-parallelizable) forward and backpropagation at each step. To address this issue, we propose a novel method called Random SAM prompt tuning (RSAM-PT) to improve the model generalization, requiring only one-step gradient computation at each step. Specifically, we search for the gradient descent direction within a random neighborhood of the parameters during each gradient update. To amplify the impact of tail-class samples and avoid overfitting, we employ the deferred re-weight scheme to increase the significance of tail-class samples. The classification accuracy of long-tailed data can be significantly improved by the proposed RSAM-PT, particularly for tail classes. RSAM-PT achieves the state-of-the-art performance of 90.3\%, 76.5\%, and 50.1\% on benchmark datasets CIFAR100-LT (IF 100), iNaturalist 2018, and Places-LT, respectively. The source code is temporarily available at https://github.com/Keke921/GNM-PT.

* NeurIPS 2024

Via

Access Paper or Ask Questions

KANsformer for Scalable Beamforming

Oct 28, 2024

Xinke Xie, Yang Lu, Chong-Yung Chi, Wei Chen, Bo Ai, Dusit Niyato

Figure 1 for KANsformer for Scalable Beamforming

Figure 2 for KANsformer for Scalable Beamforming

Figure 3 for KANsformer for Scalable Beamforming

Figure 4 for KANsformer for Scalable Beamforming

Abstract:This paper proposes an unsupervised deep-learning (DL) approach by integrating transformer and Kolmogorov-Arnold networks (KAN) termed KANsformer to realize scalable beamforming for mobile communication systems. Specifically, we consider a classic multi-input-single-output energy efficiency maximization problem subject to the total power budget. The proposed KANsformer first extracts hidden features via a multi-head self-attention mechanism and then reads out the desired beamforming design via KAN. Numerical results are provided to evaluate the KANsformer in terms of generalization performance, transfer learning and ablation experiment. Overall, the KANsformer outperforms existing benchmark DL approaches, and is adaptable to the change in the number of mobile users with real-time and near-optimal inference.

Via

Access Paper or Ask Questions

GNN-Enabled Optimization of Placement and Transmission Design for UAV Communications

Oct 03, 2024

Qinyu Wang, Yang Lu, Wei Chen, Bo Ai, Zhangdui Zhong, Dusit Niyato

Figure 1 for GNN-Enabled Optimization of Placement and Transmission Design for UAV Communications

Figure 2 for GNN-Enabled Optimization of Placement and Transmission Design for UAV Communications

Figure 3 for GNN-Enabled Optimization of Placement and Transmission Design for UAV Communications

Figure 4 for GNN-Enabled Optimization of Placement and Transmission Design for UAV Communications

Abstract:This paper applies graph neural networks (GNN) in UAV communications to optimize the placement and transmission design. We consider a multiple-user multiple-input-single-output UAV communication system where a UAV intends to find a placement to hover and serve users with maximum energy efficiency (EE). To facilitate the GNN-based learning, we adopt the hybrid maximum ratio transmission and zero forcing scheme to design the beamforming vectors and a feature augment is implemented by manually setting edge features. Furthermore, we propose a two-stage GNN-based model where the first stage and the second stage yield the placement and the transmission design, respectively. The two stages are connected via a residual and their learnable weights are jointly optimized by via unsupervised learning. Numerical results illustrate the effectiveness and validate the scalability to both UAV antennas and users of the proposed model.

Via

Access Paper or Ask Questions