Human body reconstruction with Millimeter Wave (mmWave) radar point clouds has gained significant interest due to its ability to work in adverse environments and its capacity to mitigate privacy concerns associated with traditional camera-based solutions. Despite pioneering efforts in this field, two challenges persist. Firstly, raw point clouds contain massive noise points, usually caused by the ambient objects and multi-path effects of Radio Frequency (RF) signals. Recent approaches typically rely on prior knowledge or elaborate preprocessing methods, limiting their applicability. Secondly, even after noise removal, the sparse and inconsistent body-related points pose an obstacle to accurate human body reconstruction. To address these challenges, we introduce mmBaT, a novel multi-task deep learning framework that concurrently estimates the human body and predicts body translations in subsequent frames to extract body-related point clouds. Our method is evaluated on two public datasets that are collected with different radar devices and noise levels. A comprehensive comparison against other state-of-the-art methods demonstrates our method has a superior reconstruction performance and generalization ability from noisy raw data, even when compared to methods provided with body-related point clouds.
This paper introduces a novel human pose estimation approach using sparse inertial sensors, addressing the shortcomings of previous methods reliant on synthetic data. It leverages a diverse array of real inertial motion capture data from different skeleton formats to improve motion diversity and model generalization. This method features two innovative components: a pseudo-velocity regression model for dynamic motion capture with inertial sensors, and a part-based model dividing the body and sensor data into three regions, each focusing on their unique characteristics. The approach demonstrates superior performance over state-of-the-art models across five public datasets, notably reducing pose error by 19\% on the DIP-IMU dataset, thus representing a significant improvement in inertial sensor-based human pose estimation. We will make the implementation of our model available for public use.
Human activity recognition (HAR) with wearables is one of the serviceable technologies in ubiquitous and mobile computing applications. The sliding-window scheme is widely adopted while suffering from the multi-class windows problem. As a result, there is a growing focus on joint segmentation and recognition with deep-learning methods, aiming at simultaneously dealing with HAR and time-series segmentation issues. However, obtaining the full activity annotations of wearable data sequences is resource-intensive or time-consuming, while unsupervised methods yield poor performance. To address these challenges, we propose a novel method for joint activity segmentation and recognition with timestamp supervision, in which only a single annotated sample is needed in each activity segment. However, the limited information of sparse annotations exacerbates the gap between recognition and segmentation tasks, leading to sub-optimal model performance. Therefore, the prototypes are estimated by class-activation maps to form a sample-to-prototype contrast module for well-structured embeddings. Moreover, with the optimal transport theory, our approach generates the sample-level pseudo-labels that take advantage of unlabeled data between timestamp annotations for further performance improvement. Comprehensive experiments on four public HAR datasets demonstrate that our model trained with timestamp supervision is superior to the state-of-the-art weakly-supervised methods and achieves comparable performance to the fully-supervised approaches.
We propose a novel denoising framework for task functional Magnetic Resonance Imaging (tfMRI) data to delineate the high-resolution spatial pattern of the brain functional connectivity via dictionary learning and sparse coding (DLSC). In order to address the limitations of the unsupervised DLSC-based fMRI studies, we utilize the prior knowledge of task paradigm in the learning step to train a data-driven dictionary and to model the sparse representation. We apply the proposed DLSC-based method to Human Connectome Project (HCP) motor tfMRI dataset. Studies on the functional connectivity of cerebrocerebellar circuits in somatomotor networks show that the DLSC-based denoising framework can significantly improve the prominent connectivity patterns, in comparison to the temporal non-local means (tNLM)-based denoising method as well as the case without denoising, which is consistent and neuroscientifically meaningful within motor area. The promising results show that the proposed method can provide an important foundation for the high-resolution functional connectivity analysis, and provide a better approach for fMRI preprocessing.