We present a novel locality-based learning method for cleaning and solving optical motion capture data. Given noisy marker data, we propose a new heterogeneous graph neural network which treats markers and joints as different types of nodes, and uses graph convolution operations to extract the local features of markers and joints and transform them to clean motions. To deal with anomaly markers (e.g. occluded or with big tracking errors), the key insight is that a marker's motion shows strong correlations with the motions of its immediate neighboring markers but less so with other markers, a.k.a. locality, which enables us to efficiently fill missing markers (e.g. due to occlusion). Additionally, we also identify marker outliers due to tracking errors by investigating their acceleration profiles. Finally, we propose a training regime based on representation learning and data augmentation, by training the model on data with masking. The masking schemes aim to mimic the occluded and noisy markers often observed in the real data. Finally, we show that our method achieves high accuracy on multiple metrics across various datasets. Extensive comparison shows our method outperforms state-of-the-art methods in terms of prediction accuracy of occluded marker position error by approximately 20%, which leads to a further error reduction on the reconstructed joint rotations and positions by 30%. The code and data for this paper are available at https://github.com/non-void/LocalMoCap.
Three-dimensional (3D) reconstruction of head Computed Tomography (CT) images elucidates the intricate spatial relationships of tissue structures, thereby assisting in accurate diagnosis. Nonetheless, securing an optimal head CT scan without deviation is challenging in clinical settings, owing to poor positioning by technicians, patient's physical constraints, or CT scanner tilt angle restrictions. Manual formatting and reconstruction not only introduce subjectivity but also strain time and labor resources. To address these issues, we propose an efficient automatic head CT images 3D reconstruction method, improving accuracy and repeatability, as well as diminishing manual intervention. Our approach employs a deep learning-based object detection algorithm, identifying and evaluating orbitomeatal line landmarks to automatically reformat the images prior to reconstruction. Given the dearth of existing evaluations of object detection algorithms in the context of head CT images, we compared ten methods from both theoretical and experimental perspectives. By exploring their precision, efficiency, and robustness, we singled out the lightweight YOLOv8 as the aptest algorithm for our task, with an mAP of 92.91% and impressive robustness against class imbalance. Our qualitative evaluation of standardized reconstruction results demonstrates the clinical practicability and validity of our method.
Recently significant progress has been made in vehicle prediction and planning algorithms for autonomous driving. However, it remains quite challenging for an autonomous vehicle to plan its trajectory in complex scenarios when it is difficult to accurately predict its surrounding vehicles' behaviors and trajectories. In this work, to maximize performance while ensuring safety, we propose a novel speculative planning framework based on a prediction-planning interface that quantifies both the behavior-level and trajectory-level uncertainties of surrounding vehicles. Our framework leverages recent prediction algorithms that can provide one or more possible behaviors and trajectories of the surrounding vehicles with probability estimation. It adapts those predictions based on the latest system states and traffic environment, and conducts planning to maximize the expected reward of the ego vehicle by considering the probabilistic predictions of all scenarios and ensure system safety by ruling out actions that may be unsafe in worst case. We demonstrate the effectiveness of our approach in improving system performance and ensuring system safety over other baseline methods, via extensive simulations in SUMO on a challenging multi-lane highway lane-changing case study.
Reciprocal recommender system (RRS), considering a two-way matching between two parties, has been widely applied in online platforms like online dating and recruitment. Existing RRS models mainly capture static user preferences, which have neglected the evolving user tastes and the dynamic matching relation between the two parties. Although dynamic user modeling has been well-studied in sequential recommender systems, existing solutions are developed in a user-oriented manner. Therefore, it is non-trivial to adapt sequential recommendation algorithms to reciprocal recommendation. In this paper, we formulate RRS as a distinctive sequence matching task, and further propose a new approach ReSeq for RRS, which is short for Reciprocal Sequential recommendation. To capture dual-perspective matching, we propose to learn fine-grained sequence similarities by co-attention mechanism across different time steps. Further, to improve the inference efficiency, we introduce the self-distillation technique to distill knowledge from the fine-grained matching module into the more efficient student module. In the deployment stage, only the efficient student module is used, greatly speeding up the similarity computation. Extensive experiments on five real-world datasets from two scenarios demonstrate the effectiveness and efficiency of the proposed method. Our code is available at https://github.com/RUCAIBox/ReSeq/.
While off-policy reinforcement learning (RL) algorithms are sample efficient due to gradient-based updates and data reuse in the replay buffer, they struggle with convergence to local optima due to limited exploration. On the other hand, population-based algorithms offer a natural exploration strategy, but their heuristic black-box operators are inefficient. Recent algorithms have integrated these two methods, connecting them through a shared replay buffer. However, the effect of using diverse data from population optimization iterations on off-policy RL algorithms has not been thoroughly investigated. In this paper, we first analyze the use of off-policy RL algorithms in combination with population-based algorithms, showing that the use of population data could introduce an overlooked error and harm performance. To test this, we propose a uniform and scalable training design and conduct experiments on our tailored framework in robot locomotion tasks from the OpenAI gym. Our results substantiate that using population data in off-policy RL can cause instability during training and even degrade performance. To remedy this issue, we further propose a double replay buffer design that provides more on-policy data and show its effectiveness through experiments. Our results offer practical insights for training these hybrid methods.
Learning new classes without forgetting is crucial for real-world applications for a classification model. Vision Transformers (ViT) recently achieve remarkable performance in Class Incremental Learning (CIL). Previous works mainly focus on block design and model expansion for ViTs. However, in this paper, we find that when the ViT is incrementally trained, the attention layers gradually lose concentration on local features. We call this interesting phenomenon as \emph{Locality Degradation} in ViTs for CIL. Since the low-level local information is crucial to the transferability of the representation, it is beneficial to preserve the locality in attention layers. In this paper, we encourage the model to preserve more local information as the training procedure goes on and devise a Locality-Preserved Attention (LPA) layer to emphasize the importance of local features. Specifically, we incorporate the local information directly into the vanilla attention and control the initial gradients of the vanilla attention by weighting it with a small initial value. Extensive experiments show that the representations facilitated by LPA capture more low-level general information which is easier to transfer to follow-up tasks. The improved model gets consistently better performance on CIFAR100 and ImageNet100.
Connectivity technology has shown great potentials in improving the safety and efficiency of transportation systems by providing information beyond the perception and prediction capabilities of individual vehicles. However, it is expected that human-driven and autonomous vehicles, and connected and non-connected vehicles need to share the transportation network during the transition period to fully connected and automated transportation systems. Such mixed traffic scenarios significantly increase the complexity in analyzing system behavior and quantifying uncertainty for highly interactive scenarios, e.g., lane changing. It is even harder to ensure system safety when neural network based planners are leveraged to further improve efficiency. In this work, we propose a connectivity-enhanced neural network based lane changing planner. By cooperating with surrounding connected vehicles in dynamic environment, our proposed planner will adapt its planned trajectory according to the analysis of a safe evasion trajectory. We demonstrate the strength of our planner design in improving efficiency and ensuring safety in various mixed traffic scenarios with extensive simulations. We also analyze the system robustness when the communication or coordination is not perfect.
RecBole has recently attracted increasing attention from the research community. As the increase of the number of users, we have received a number of suggestions and update requests. This motivates us to make some significant improvements on our library, so as to meet the user requirements and contribute to the research community. In order to show the recent update in RecBole, we write this technical report to introduce our latest improvements on RecBole. In general, we focus on the flexibility and efficiency of RecBole in the past few months. More specifically, we have four development targets: (1) more flexible data processing, (2) more efficient model training, (3) more reproducible configurations, and (4) more comprehensive user documentation. Readers can download the above updates at: https://github.com/RUCAIBox/RecBole.
Trajectory generation and prediction are two interwoven tasks that play important roles in planner evaluation and decision making for intelligent vehicles. Most existing methods focus on one of the two and are optimized to directly output the final generated/predicted trajectories, which only contain limited information for critical scenario augmentation and safe planning. In this work, we propose a novel behavior-aware Trajectory Autoencoder (TAE) that explicitly models drivers' behavior such as aggressiveness and intention in the latent space, using semi-supervised adversarial autoencoder and domain knowledge in transportation. Our model addresses trajectory generation and prediction in a unified architecture and benefits both tasks: the model can generate diverse, controllable and realistic trajectories to enhance planner optimization in safety-critical and long-tailed scenarios, and it can provide prediction of critical behavior in addition to the final trajectories for decision making. Experimental results demonstrate that our method achieves promising performance on both trajectory generation and prediction.