Alert button
Picture for Soo-Hyung Kim

Soo-Hyung Kim

Alert button

DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in Conversation

Jul 31, 2023
Vu Ngoc Tu, Van Thong Huynh, Hyung-Jeong Yang, M. Zaigham Zaheer, Shah Nawaz, Karthik Nandakumar, Soo-Hyung Kim

Conversational engagement estimation is posed as a regression problem, entailing the identification of the favorable attention and involvement of the participants in the conversation. This task arises as a crucial pursuit to gain insights into human's interaction dynamics and behavior patterns within a conversation. In this research, we introduce a dilated convolutional Transformer for modeling and estimating human engagement in the MULTIMEDIATE 2023 competition. Our proposed system surpasses the baseline models, exhibiting a noteworthy $7$\% improvement on test set and $4$\% on validation set. Moreover, we employ different modality fusion mechanism and show that for this type of data, a simple concatenated method with self-attention fusion gains the best performance.

* Accepted in ACMM Grand Challenge 
Viaarxiv icon

Mental Workload Estimation with Electroencephalogram Signals by Combining Multi-Space Deep Models

Jul 23, 2023
Hong-Hai Nguyen, Ngumimi Karen Iyortsuun, Hyung-Jeong Yang, Guee-Sang Lee, Soo-Hyung Kim

Figure 1 for Mental Workload Estimation with Electroencephalogram Signals by Combining Multi-Space Deep Models
Figure 2 for Mental Workload Estimation with Electroencephalogram Signals by Combining Multi-Space Deep Models
Figure 3 for Mental Workload Estimation with Electroencephalogram Signals by Combining Multi-Space Deep Models
Figure 4 for Mental Workload Estimation with Electroencephalogram Signals by Combining Multi-Space Deep Models

The human brain is in a continuous state of activity during both work and rest. Mental activity is a daily process, and when the brain is overworked, it can have negative effects on human health. In recent years, great attention has been paid to early detection of mental health problems because it can help prevent serious health problems and improve quality of life. Several signals are used to assess mental state, but the electroencephalogram (EEG) is widely used by researchers because of the large amount of information it provides about the brain. This paper aims to classify mental workload into three states and estimate continuum levels. Our method combines multiple dimensions of space to achieve the best results for mental estimation. In the time domain approach, we use Temporal Convolutional Networks, and in the frequency domain, we propose a new architecture called the Multi-Dimensional Residual Block, which combines residual blocks.

* 9 pages, 3 figures 
Viaarxiv icon

Multi-scale Transformer-based Network for Emotion Recognition from Multi Physiological Signals

May 08, 2023
Tu Vu, Van Thong Huynh, Soo-Hyung Kim

Figure 1 for Multi-scale Transformer-based Network for Emotion Recognition from Multi Physiological Signals
Figure 2 for Multi-scale Transformer-based Network for Emotion Recognition from Multi Physiological Signals

This paper presents an efficient Multi-scale Transformer-based approach for the task of Emotion recognition from Physiological data, which has gained widespread attention in the research community due to the vast amount of information that can be extracted from these signals using modern sensors and machine learning techniques. Our approach involves applying a Multi-modal technique combined with scaling data to establish the relationship between internal body signals and human emotions. Additionally, we utilize Transformer and Gaussian Transformation techniques to improve signal encoding effectiveness and overall performance. Our model achieves decent results on the CASE dataset of the EPiC competition, with an RMSE score of 1.45.

Viaarxiv icon

CoNIC Challenge: Pushing the Frontiers of Nuclear Detection, Segmentation, Classification and Counting

Mar 14, 2023
Simon Graham, Quoc Dang Vu, Mostafa Jahanifar, Martin Weigert, Uwe Schmidt, Wenhua Zhang, Jun Zhang, Sen Yang, Jinxi Xiang, Xiyue Wang, Josef Lorenz Rumberger, Elias Baumann, Peter Hirsch, Lihao Liu, Chenyang Hong, Angelica I. Aviles-Rivero, Ayushi Jain, Heeyoung Ahn, Yiyu Hong, Hussam Azzuni, Min Xu, Mohammad Yaqub, Marie-Claire Blache, Benoît Piégu, Bertrand Vernay, Tim Scherr, Moritz Böhland, Katharina Löffler, Jiachen Li, Weiqin Ying, Chixin Wang, Dagmar Kainmueller, Carola-Bibiane Schönlieb, Shuolin Liu, Dhairya Talsania, Yughender Meda, Prakash Mishra, Muhammad Ridzuan, Oliver Neumann, Marcel P. Schilling, Markus Reischl, Ralf Mikut, Banban Huang, Hsiang-Chin Chien, Ching-Ping Wang, Chia-Yen Lee, Hong-Kun Lin, Zaiyi Liu, Xipeng Pan, Chu Han, Jijun Cheng, Muhammad Dawood, Srijay Deshpande, Raja Muhammad Saad Bashir, Adam Shephard, Pedro Costa, João D. Nunes, Aurélio Campilho, Jaime S. Cardoso, Hrishikesh P S, Densen Puthussery, Devika R G, Jiji C V, Ye Zhang, Zijie Fang, Zhifan Lin, Yongbing Zhang, Chunhui Lin, Liukun Zhang, Lijian Mao, Min Wu, Vi Thi-Tuong Vo, Soo-Hyung Kim, Taebum Lee, Satoshi Kondo, Satoshi Kasai, Pranay Dumbhare, Vedant Phuse, Yash Dubey, Ankush Jamthikar, Trinh Thi Le Vuong, Jin Tae Kwak, Dorsa Ziaei, Hyun Jung, Tianyi Miao, David Snead, Shan E Ahmed Raza, Fayyaz Minhas, Nasir M. Rajpoot

Figure 1 for CoNIC Challenge: Pushing the Frontiers of Nuclear Detection, Segmentation, Classification and Counting
Figure 2 for CoNIC Challenge: Pushing the Frontiers of Nuclear Detection, Segmentation, Classification and Counting
Figure 3 for CoNIC Challenge: Pushing the Frontiers of Nuclear Detection, Segmentation, Classification and Counting
Figure 4 for CoNIC Challenge: Pushing the Frontiers of Nuclear Detection, Segmentation, Classification and Counting

Nuclear detection, segmentation and morphometric profiling are essential in helping us further understand the relationship between histology and patient outcome. To drive innovation in this area, we setup a community-wide challenge using the largest available dataset of its kind to assess nuclear segmentation and cellular composition. Our challenge, named CoNIC, stimulated the development of reproducible algorithms for cellular recognition with real-time result inspection on public leaderboards. We conducted an extensive post-challenge analysis based on the top-performing models using 1,658 whole-slide images of colon tissue. With around 700 million detected nuclei per model, associated features were used for dysplasia grading and survival analysis, where we demonstrated that the challenge's improvement over the previous state-of-the-art led to significant boosts in downstream performance. Our findings also suggest that eosinophils and neutrophils play an important role in the tumour microevironment. We release challenge models and WSI-level results to foster the development of further methods for biomarker discovery.

Viaarxiv icon

Generic Event Boundary Detection in Video with Pyramid Features

Jan 11, 2023
Van Thong Huynh, Hyung-Jeong Yang, Guee-Sang Lee, Soo-Hyung Kim

Figure 1 for Generic Event Boundary Detection in Video with Pyramid Features
Figure 2 for Generic Event Boundary Detection in Video with Pyramid Features
Figure 3 for Generic Event Boundary Detection in Video with Pyramid Features
Figure 4 for Generic Event Boundary Detection in Video with Pyramid Features

Generic event boundary detection (GEBD) aims to split video into chunks at a broad and diverse set of actions as humans naturally perceive event boundaries. In this study, we present an approach that considers the correlation between neighbor frames with pyramid feature maps in both spatial and temporal dimensions to construct a framework for localizing generic events in video. The features at multiple spatial dimensions of a pre-trained ResNet-50 are exploited with different views in the temporal dimension to form a temporal pyramid feature map. Based on that, the similarity between neighbor frames is calculated and projected to build a temporal pyramid similarity feature vector. A decoder with 1D convolution operations is used to decode these similarities to a new representation that incorporates their temporal relationship for later boundary score estimation. Extensive experiments conducted on the GEBD benchmark dataset show the effectiveness of our system and its variations, in which we outperformed the state-of-the-art approaches. Additional experiments on TAPOS dataset, which contains long-form videos with Olympic sport actions, demonstrated the effectiveness of our study compared to others.

Viaarxiv icon

An Ensemble Approach for Multiple Emotion Descriptors Estimation Using Multi-task Learning

Jul 22, 2022
Irfan Haider, Minh-Trieu Tran, Soo-Hyung Kim, Hyung-Jeong Yang, Guee-Sang Lee

Figure 1 for An Ensemble Approach for Multiple Emotion Descriptors Estimation Using Multi-task Learning
Figure 2 for An Ensemble Approach for Multiple Emotion Descriptors Estimation Using Multi-task Learning
Figure 3 for An Ensemble Approach for Multiple Emotion Descriptors Estimation Using Multi-task Learning
Figure 4 for An Ensemble Approach for Multiple Emotion Descriptors Estimation Using Multi-task Learning

This paper illustrates our submission method to the fourth Affective Behavior Analysis in-the-Wild (ABAW) Competition. The method is used for the Multi-Task Learning Challenge. Instead of using only face information, we employ full information from a provided dataset containing face and the context around the face. We utilized the InceptionNet V3 model to extract deep features then we applied the attention mechanism to refine the features. After that, we put those features into the transformer block and multi-layer perceptron networks to get the final multiple kinds of emotion. Our model predicts arousal and valence, classifies the emotional expression and estimates the action units simultaneously. The proposed system achieves the performance of 0.917 on the MTL Challenge validation dataset.

Viaarxiv icon

Facial Expression Classification using Fusion of Deep Neural Network in Video for the 3rd ABAW3 Competition

Apr 08, 2022
Kim Ngan Phan, Hong-Hai Nguyen, Van-Thong Huynh, Soo-Hyung Kim

Figure 1 for Facial Expression Classification using Fusion of Deep Neural Network in Video for the 3rd ABAW3 Competition
Figure 2 for Facial Expression Classification using Fusion of Deep Neural Network in Video for the 3rd ABAW3 Competition
Figure 3 for Facial Expression Classification using Fusion of Deep Neural Network in Video for the 3rd ABAW3 Competition
Figure 4 for Facial Expression Classification using Fusion of Deep Neural Network in Video for the 3rd ABAW3 Competition

For computers to recognize human emotions, expression classification is an equally important problem in the human-computer interaction area. In the 3rd Affective Behavior Analysis In-The-Wild competition, the task of expression classification includes eight classes with six basic expressions of human faces from videos. In this paper, we employ a transformer mechanism to encode the robust representation from the backbone. Fusion of the robust representations plays an important role in the expression classification task. Our approach achieves 30.35\% and 28.60\% for the $F_1$ score on the validation set and the test set, respectively. This result shows the effectiveness of the proposed architecture based on the Aff-Wild2 dataset.

Viaarxiv icon

An Ensemble Approach for Facial Expression Analysis in Video

Mar 24, 2022
Hong-Hai Nguyen, Van-Thong Huynh, Soo-Hyung Kim

Figure 1 for An Ensemble Approach for Facial Expression Analysis in Video
Figure 2 for An Ensemble Approach for Facial Expression Analysis in Video
Figure 3 for An Ensemble Approach for Facial Expression Analysis in Video
Figure 4 for An Ensemble Approach for Facial Expression Analysis in Video

Human emotions recognization contributes to the development of human-computer interaction. The machines understanding human emotions in the real world will significantly contribute to life in the future. This paper will introduce the Affective Behavior Analysis in-the-wild (ABAW3) 2022 challenge. The paper focuses on solving the problem of the valence-arousal estimation and action unit detection. For valence-arousal estimation, we conducted two stages: creating new features from multimodel and temporal learning to predict valence-arousal. First, we make new features; the Gated Recurrent Unit (GRU) and Transformer are combined using a Regular Networks (RegNet) feature, which is extracted from the image. The next step is the GRU combined with Local Attention to predict valence-arousal. The Concordance Correlation Coefficient (CCC) was used to evaluate the model.

Viaarxiv icon

MF-Hovernet: An Extension of Hovernet for Colon Nuclei Identification and Counting (CoNiC) Challenge

Mar 04, 2022
Vi Thi-Tuong Vo, Soo-Hyung Kim, Taebum Lee

Figure 1 for MF-Hovernet: An Extension of Hovernet for Colon Nuclei Identification and Counting (CoNiC) Challenge
Figure 2 for MF-Hovernet: An Extension of Hovernet for Colon Nuclei Identification and Counting (CoNiC) Challenge
Figure 3 for MF-Hovernet: An Extension of Hovernet for Colon Nuclei Identification and Counting (CoNiC) Challenge
Figure 4 for MF-Hovernet: An Extension of Hovernet for Colon Nuclei Identification and Counting (CoNiC) Challenge

Nuclei Identification and Counting is the most important morphological feature of cancers, especially in the colon. Many deep learning-based methods have been proposed to deal with this problem. In this work, we construct an extension of Hovernet for nuclei identification and counting to address the problem named MF-Hovernet. Our proposed model is the combination of multiple filer block to Hovernet architecture. The current result shows the efficiency of multiple filter block to improve the performance of the original Hovernet model.

Viaarxiv icon