Abstract:Engagement between client and therapist is a critical determinant of therapeutic success. We propose a multi-dimensional natural language processing (NLP) framework that objectively classifies engagement quality in counseling sessions based on textual transcripts. Using 253 motivational interviewing transcripts (150 high-quality, 103 low-quality), we extracted 42 features across four domains: conversational dynamics, semantic similarity as topic alignment, sentiment classification, and question detection. Classifiers, including Random Forest (RF), Cat-Boost, and Support Vector Machines (SVM), were hyperparameter tuned and trained using a stratified 5-fold cross-validation and evaluated on a holdout test set. On balanced (non-augmented) data, RF achieved the highest classification accuracy (76.7%), and SVM achieved the highest AUC (85.4%). After SMOTE-Tomek augmentation, performance improved significantly: RF achieved up to 88.9% accuracy, 90.0% F1-score, and 94.6% AUC, while SVM reached 81.1% accuracy, 83.1% F1-score, and 93.6% AUC. The augmented data results reflect the potential of the framework in future larger-scale applications. Feature contribution revealed conversational dynamics and semantic similarity between clients and therapists were among the top contributors, led by words uttered by the client (mean and standard deviation). The framework was robust across the original and augmented datasets and demonstrated consistent improvements in F1 scores and recall. While currently text-based, the framework supports future multimodal extensions (e.g., vocal tone, facial affect) for more holistic assessments. This work introduces a scalable, data-driven method for evaluating engagement quality of the therapy session, offering clinicians real-time feedback to enhance the quality of both virtual and in-person therapeutic interactions.
Abstract:This paper presents a proof of concept for the usefulness of second-order texture features for the qualitative analysis and classification of chromogenic in-situ hybridization whole slide images in high-throughput imaging experiments. The challenge is that currently, the gold standard for gene expression grading in such images is expert assessment. The idea of the research team is to use different approaches in the analysis of these images that will be used for structural segmentation and functional analysis in gene expression. The article presents such perspective idea to select a number of textural features that are going to be used for classification. In our experiment, natural grouping of image samples (tiles) depending on their local texture properties was explored in an unsupervised classification procedure. The features are reduced to two dimensions with fuzzy c-means clustering. The overall conclusion of this experiment is that Haralick features are a viable choice for classification and analysis of chromogenic in-situ hybridization image data. The principal component analysis approach produced slightly more "understandable" from an annotator's point of view classes.