Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chunzhuo Wang

Robust Multimodal Learning Framework For Intake Gesture Detection Using Contactless Radar and Wearable IMU Sensors

Jul 09, 2025

Chunzhuo Wang, Hans Hallez, Bart Vanrumste

Figure 1 for Robust Multimodal Learning Framework For Intake Gesture Detection Using Contactless Radar and Wearable IMU Sensors

Figure 2 for Robust Multimodal Learning Framework For Intake Gesture Detection Using Contactless Radar and Wearable IMU Sensors

Figure 3 for Robust Multimodal Learning Framework For Intake Gesture Detection Using Contactless Radar and Wearable IMU Sensors

Figure 4 for Robust Multimodal Learning Framework For Intake Gesture Detection Using Contactless Radar and Wearable IMU Sensors

Abstract:Automated food intake gesture detection plays a vital role in dietary monitoring, enabling objective and continuous tracking of eating behaviors to support better health outcomes. Wrist-worn inertial measurement units (IMUs) have been widely used for this task with promising results. More recently, contactless radar sensors have also shown potential. This study explores whether combining wearable and contactless sensing modalities through multimodal learning can further improve detection performance. We also address a major challenge in multimodal learning: reduced robustness when one modality is missing. To this end, we propose a robust multimodal temporal convolutional network with cross-modal attention (MM-TCN-CMA), designed to integrate IMU and radar data, enhance gesture detection, and maintain performance under missing modality conditions. A new dataset comprising 52 meal sessions (3,050 eating gestures and 797 drinking gestures) from 52 participants is developed and made publicly available. Experimental results show that the proposed framework improves the segmental F1-score by 4.3% and 5.2% over unimodal Radar and IMU models, respectively. Under missing modality scenarios, the framework still achieves gains of 1.3% and 2.4% for missing radar and missing IMU inputs. This is the first study to demonstrate a robust multimodal learning framework that effectively fuses IMU and radar data for food intake gesture detection.

* This manuscript has been submitted to a peer-reviewed journal and is currently under review

Via

Access Paper or Ask Questions

AI-assisted Automatic Jump Detection and Height Estimation in Volleyball Using a Waist-worn IMU

May 09, 2025

Weiyi Xu, Chunzhuo Wang, Meng Shang, Camilla De Bleecker, Maria Torres Vega, Jos Vanrenterghem, Bart Vanrumste

Abstract:The physical load of jumps plays a critical role in injury prevention for volleyball players. However, manual video analysis of jump activities is time-intensive and costly, requiring significant effort and expensive hardware setups. The advent of the inertial measurement unit (IMU) and machine learning algorithms offers a convenient and efficient alternative. Despite this, previous research has largely focused on either jump classification or physical load estimation, leaving a gap in integrated solutions. This study aims to present a pipeline to automatically detect jumps and predict heights using data from a waist-worn IMU. The pipeline leverages a Multi-Stage Temporal Convolutional Network (MS-TCN) to detect jump segments in time-series data and classify the specific jump category. Subsequently, jump heights are estimated using three downstream regression machine learning models based on the identified segments. Our method is verified on a dataset comprising 10 players and 337 jumps. Compared to the result of VERT in height estimation (R-squared=-1.53), a commercial device commonly used in jump landing tasks, our method not only accurately identifies jump activities and their specific types (F1-score=0.90) but also demonstrates superior performance in height prediction (R-squared=0.50). This integrated solution offers a promising tool for monitoring physical load and mitigating injury risk in volleyball players.

* submitted to EMBC conference 2025 (accepeted)

Via

Access Paper or Ask Questions

Skeleton-Based Intake Gesture Detection With Spatial-Temporal Graph Convolutional Networks

Apr 14, 2025

Chunzhuo Wang, Zhewen Xue, T. Sunil Kumar, Guido Camps, Hans Hallez, Bart Vanrumste

Abstract:Overweight and obesity have emerged as widespread societal challenges, frequently linked to unhealthy eating patterns. A promising approach to enhance dietary monitoring in everyday life involves automated detection of food intake gestures. This study introduces a skeleton based approach using a model that combines a dilated spatial-temporal graph convolutional network (ST-GCN) with a bidirectional long-short-term memory (BiLSTM) framework, as called ST-GCN-BiLSTM, to detect intake gestures. The skeleton-based method provides key benefits, including environmental robustness, reduced data dependency, and enhanced privacy preservation. Two datasets were employed for model validation. The OREBA dataset, which consists of laboratory-recorded videos, achieved segmental F1-scores of 86.18% and 74.84% for identifying eating and drinking gestures. Additionally, a self-collected dataset using smartphone recordings in more adaptable experimental conditions was evaluated with the model trained on OREBA, yielding F1-scores of 85.40% and 67.80% for detecting eating and drinking gestures. The results not only confirm the feasibility of utilizing skeleton data for intake gesture detection but also highlight the robustness of the proposed approach in cross-dataset validation.

* The manuscript has been accepted in 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE EMBC 2025)

Via

Access Paper or Ask Questions

Language-Assisted Human Part Motion Learning for Skeleton-Based Temporal Action Segmentation

Oct 08, 2024

Bowen Chen, Haoyu Ji, Zhiyong Wang, Benjamin Filtjens, Chunzhuo Wang, Weihong Ren, Bart Vanrumste, Honghai Liu

Figure 1 for Language-Assisted Human Part Motion Learning for Skeleton-Based Temporal Action Segmentation

Figure 2 for Language-Assisted Human Part Motion Learning for Skeleton-Based Temporal Action Segmentation

Figure 3 for Language-Assisted Human Part Motion Learning for Skeleton-Based Temporal Action Segmentation

Figure 4 for Language-Assisted Human Part Motion Learning for Skeleton-Based Temporal Action Segmentation

Abstract:Skeleton-based Temporal Action Segmentation involves the dense action classification of variable-length skeleton sequences. Current approaches primarily apply graph-based networks to extract framewise, whole-body-level motion representations, and use one-hot encoded labels for model optimization. However, whole-body motion representations do not capture fine-grained part-level motion representations and the one-hot encoded labels neglect the intrinsic semantic relationships within the language-based action definitions. To address these limitations, we propose a novel method named Language-assisted Human Part Motion Representation Learning (LPL), which contains a Disentangled Part Motion Encoder (DPE) to extract dual-level (i.e., part and whole-body) motion representations and a Language-assisted Distribution Alignment (LDA) strategy for optimizing spatial relations within representations. Specifically, after part-aware skeleton encoding via DPE, LDA generates dual-level action descriptions to construct a textual embedding space with the help of a large-scale language model. Then, LDA motivates the alignment of the embedding space between text descriptions and motions. This alignment allows LDA not only to enhance intra-class compactness but also to transfer the language-encoded semantic correlations among actions to skeleton-based motion learning. Moreover, we propose a simple yet efficient Semantic Offset Adapter to smooth the cross-domain misalignment. Our experiments indicate that LPL achieves state-of-the-art performance across various datasets (e.g., +4.4\% Accuracy, +5.6\% F1 on the PKU-MMD dataset). Moreover, LDA is compatible with existing methods and improves their performance (e.g., +4.8\% Accuracy, +4.3\% F1 on the LARa dataset) without additional inference costs.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Eat-Radar: Continuous Fine-Grained Eating Gesture Detection Using FMCW Radar and 3D Temporal Convolutional Network

Nov 08, 2022

Chunzhuo Wang, T. Sunil Kumar, Walter De Raedt, Guido Camps, Hans Hallez, Bart Vanrumste

Figure 1 for Eat-Radar: Continuous Fine-Grained Eating Gesture Detection Using FMCW Radar and 3D Temporal Convolutional Network

Figure 2 for Eat-Radar: Continuous Fine-Grained Eating Gesture Detection Using FMCW Radar and 3D Temporal Convolutional Network

Figure 3 for Eat-Radar: Continuous Fine-Grained Eating Gesture Detection Using FMCW Radar and 3D Temporal Convolutional Network

Figure 4 for Eat-Radar: Continuous Fine-Grained Eating Gesture Detection Using FMCW Radar and 3D Temporal Convolutional Network

Abstract:Unhealthy dietary habits are considered as the primary cause of multiple chronic diseases such as obesity and diabetes. The automatic food intake monitoring system has the potential to improve the quality of life (QoF) of people with dietary related diseases through dietary assessment. In this work, we propose a novel contact-less radar-based food intake monitoring approach. Specifically, a Frequency Modulated Continuous Wave (FMCW) radar sensor is employed to recognize fine-grained eating and drinking gestures. The fine-grained eating/drinking gesture contains a series of movement from raising the hand to the mouth until putting away the hand from the mouth. A 3D temporal convolutional network (3D-TCN) is developed to detect and segment eating and drinking gestures in meal sessions by processing the Range-Doppler Cube (RD Cube). Unlike previous radar-based research, this work collects data in continuous meal sessions. We create a public dataset that contains 48 meal sessions (3121 eating gestures and 608 drinking gestures) from 48 participants with a total duration of 783 minutes. Four eating styles (fork & knife, chopsticks, spoon, hand) are included in this dataset. To validate the performance of the proposed approach, 8-fold cross validation method is applied. Experimental results show that our proposed 3D-TCN outperforms the model that combines a convolutional neural network and a long-short-term-memory network (CNN-LSTM), and also the CNN-Bidirectional LSTM model (CNN-BiLSTM) in eating and drinking gesture detection. The 3D-TCN model achieves a segmental F1-score of 0.887 and 0.844 for eating and drinking gestures, respectively. The results of the proposed approach indicate the feasibility of using radar for fine-grained eating and drinking gesture detection and segmentation in meal sessions.

Via

Access Paper or Ask Questions