Alert button
Picture for Siheng Chen

Siheng Chen

Alert button

Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction

Sep 02, 2023
Chenxin Xu, Robby T. Tan, Yuhong Tan, Siheng Chen, Xinchao Wang, Yanfeng Wang

Exploring spatial-temporal dependencies from observed motions is one of the core challenges of human motion prediction. Previous methods mainly focus on dedicated network structures to model the spatial and temporal dependencies. This paper considers a new direction by introducing a model learning framework with auxiliary tasks. In our auxiliary tasks, partial body joints' coordinates are corrupted by either masking or adding noise and the goal is to recover corrupted coordinates depending on the rest coordinates. To work with auxiliary tasks, we propose a novel auxiliary-adapted transformer, which can handle incomplete, corrupted motion data and achieve coordinate recovery via capturing spatial-temporal dependencies. Through auxiliary tasks, the auxiliary-adapted transformer is promoted to capture more comprehensive spatial-temporal dependencies among body joints' coordinates, leading to better feature learning. Extensive experimental results have shown that our method outperforms state-of-the-art methods by remarkable margins of 7.2%, 3.7%, and 9.4% in terms of 3D mean per joint position error (MPJPE) on the Human3.6M, CMU Mocap, and 3DPW datasets, respectively. We also demonstrate that our method is more robust under data missing cases and noisy data cases. Code is available at https://github.com/MediaBrain-SJTU/AuxFormer.

* Accpeted to ICCV2023 
Viaarxiv icon

Hypergraph Structure Inference From Data Under Smoothness Prior

Aug 31, 2023
Bohan Tang, Siheng Chen, Xiaowen Dong

Hypergraphs are important for processing data with higher-order relationships involving more than two entities. In scenarios where explicit hypergraphs are not readily available, it is desirable to infer a meaningful hypergraph structure from the node features to capture the intrinsic relations within the data. However, existing methods either adopt simple pre-defined rules that fail to precisely capture the distribution of the potential hypergraph structure, or learn a mapping between hypergraph structures and node features but require a large amount of labelled data, i.e., pre-existing hypergraph structures, for training. Both restrict their applications in practical scenarios. To fill this gap, we propose a novel smoothness prior that enables us to design a method to infer the probability for each potential hyperedge without labelled data as supervision. The proposed prior indicates features of nodes in a hyperedge are highly correlated by the features of the hyperedge containing them. We use this prior to derive the relation between the hypergraph structure and the node features via probabilistic modelling. This allows us to develop an unsupervised inference method to estimate the probability for each potential hyperedge via solving an optimisation problem that has an analytical solution. Experiments on both synthetic and real-world data demonstrate that our method can learn meaningful hypergraph structures from data more efficiently than existing hypergraph structure inference methods.

Viaarxiv icon

Joint-Relation Transformer for Multi-Person Motion Prediction

Aug 09, 2023
Qingyao Xu, Weibo Mao, Jingze Gong, Chenxin Xu, Siheng Chen, Weidi Xie, Ya Zhang, Yanfeng Wang

Figure 1 for Joint-Relation Transformer for Multi-Person Motion Prediction
Figure 2 for Joint-Relation Transformer for Multi-Person Motion Prediction
Figure 3 for Joint-Relation Transformer for Multi-Person Motion Prediction
Figure 4 for Joint-Relation Transformer for Multi-Person Motion Prediction

Multi-person motion prediction is a challenging problem due to the dependency of motion on both individual past movements and interactions with other people. Transformer-based methods have shown promising results on this task, but they miss the explicit relation representation between joints, such as skeleton structure and pairwise distance, which is crucial for accurate interaction modeling. In this paper, we propose the Joint-Relation Transformer, which utilizes relation information to enhance interaction modeling and improve future motion prediction. Our relation information contains the relative distance and the intra-/inter-person physical constraints. To fuse relation and joint information, we design a novel joint-relation fusion layer with relation-aware attention to update both features. Additionally, we supervise the relation information by forecasting future distance. Experiments show that our method achieves a 13.4% improvement of 900ms VIM on 3DPW-SoMoF/RC and 17.8%/12.0% improvement of 3s MPJPE on CMU-Mpcap/MuPoTS-3D dataset.

Viaarxiv icon

FedDisco: Federated Learning with Discrepancy-Aware Collaboration

May 30, 2023
Rui Ye, Mingkai Xu, Jianyu Wang, Chenxin Xu, Siheng Chen, Yanfeng Wang

Figure 1 for FedDisco: Federated Learning with Discrepancy-Aware Collaboration
Figure 2 for FedDisco: Federated Learning with Discrepancy-Aware Collaboration
Figure 3 for FedDisco: Federated Learning with Discrepancy-Aware Collaboration
Figure 4 for FedDisco: Federated Learning with Discrepancy-Aware Collaboration

This work considers the category distribution heterogeneity in federated learning. This issue is due to biased labeling preferences at multiple clients and is a typical setting of data heterogeneity. To alleviate this issue, most previous works consider either regularizing local models or fine-tuning the global model, while they ignore the adjustment of aggregation weights and simply assign weights based on the dataset size. However, based on our empirical observations and theoretical analysis, we find that the dataset size is not optimal and the discrepancy between local and global category distributions could be a beneficial and complementary indicator for determining aggregation weights. We thus propose a novel aggregation method, Federated Learning with Discrepancy-aware Collaboration (FedDisco), whose aggregation weights not only involve both the dataset size and the discrepancy value, but also contribute to a tighter theoretical upper bound of the optimization error. FedDisco also promotes privacy-preservation, communication and computation efficiency, as well as modularity. Extensive experiments show that our FedDisco outperforms several state-of-the-art methods and can be easily incorporated with many existing methods to further enhance the performance. Our code will be available at https://github.com/MediaBrain-SJTU/FedDisco.

* Accepted by International Conference on Machine Learning (ICML2023) 
Viaarxiv icon

Interruption-Aware Cooperative Perception for V2X Communication-Aided Autonomous Driving

Apr 24, 2023
Shunli Ren, Zixing Lei, Zi Wang, Mehrdad Dianati, Yafei Wang, Siheng Chen, Wenjun Zhang

Figure 1 for Interruption-Aware Cooperative Perception for V2X Communication-Aided Autonomous Driving
Figure 2 for Interruption-Aware Cooperative Perception for V2X Communication-Aided Autonomous Driving
Figure 3 for Interruption-Aware Cooperative Perception for V2X Communication-Aided Autonomous Driving
Figure 4 for Interruption-Aware Cooperative Perception for V2X Communication-Aided Autonomous Driving

Cooperative perception enabled by V2X Communication technologies can significantly improve the perception performance of autonomous vehicles beyond the limited perception ability of the individual vehicles, therefore, improving the safety and efficiency of autonomous driving in intelligent transportation systems. However, in order to fully reap the benefits of cooperative perception in practice, the impacts of imperfect V2X communication, i.e., communication errors and disruptions, need to be understood and effective remedies need to be developed to alleviate their adverse impacts. Motivated by this need, we propose a novel INterruption-aware robust COoperative Perception (V2X-INCOP) solution for V2X communication-aided autonomous driving, which leverages historical information to recover missing information due to interruption. To achieve comprehensive recovery, we design a communication adaptive multi-scale spatial-temporal prediction model to extract multi-scale spatial-temporal features based on V2X communication conditions and capture the most significant information for the prediction of the missing information. To further improve recovery performance, we adopt a knowledge distillation framework to give direct supervision to the prediction model and a curriculum learning strategy to stabilize the training of the model. Our experiments on three public cooperative perception datasets demonstrate that our proposed method is effective in alleviating the impacts of communication interruption on cooperative perception.

Viaarxiv icon

Among Us: Adversarially Robust Collaborative Perception by Consensus

Mar 27, 2023
Yiming Li, Qi Fang, Jiamu Bai, Siheng Chen, Felix Juefei-Xu, Chen Feng

Figure 1 for Among Us: Adversarially Robust Collaborative Perception by Consensus
Figure 2 for Among Us: Adversarially Robust Collaborative Perception by Consensus
Figure 3 for Among Us: Adversarially Robust Collaborative Perception by Consensus
Figure 4 for Among Us: Adversarially Robust Collaborative Perception by Consensus

Multiple robots could perceive a scene (e.g., detect objects) collaboratively better than individuals, although easily suffer from adversarial attacks when using deep learning. This could be addressed by the adversarial defense, but its training requires the often-unknown attacking mechanism. Differently, we propose ROBOSAC, a novel sampling-based defense strategy generalizable to unseen attackers. Our key idea is that collaborative perception should lead to consensus rather than dissensus in results compared to individual perception. This leads to our hypothesize-and-verify framework: perception results with and without collaboration from a random subset of teammates are compared until reaching a consensus. In such a framework, more teammates in the sampled subset often entail better perception performance but require longer sampling time to reject potential attackers. Thus, we derive how many sampling trials are needed to ensure the desired size of an attacker-free subset, or equivalently, the maximum size of such a subset that we can successfully sample within a given number of trials. We validate our method on the task of collaborative 3D object detection in autonomous driving scenarios.

Viaarxiv icon

EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning

Mar 27, 2023
Chenxin Xu, Robby T. Tan, Yuhong Tan, Siheng Chen, Yu Guang Wang, Xinchao Wang, Yanfeng Wang

Figure 1 for EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning
Figure 2 for EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning
Figure 3 for EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning
Figure 4 for EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning

Learning to predict agent motions with relationship reasoning is important for many applications. In motion prediction tasks, maintaining motion equivariance under Euclidean geometric transformations and invariance of agent interaction is a critical and fundamental principle. However, such equivariance and invariance properties are overlooked by most existing methods. To fill this gap, we propose EqMotion, an efficient equivariant motion prediction model with invariant interaction reasoning. To achieve motion equivariance, we propose an equivariant geometric feature learning module to learn a Euclidean transformable feature through dedicated designs of equivariant operations. To reason agent's interactions, we propose an invariant interaction reasoning module to achieve a more stable interaction modeling. To further promote more comprehensive motion features, we propose an invariant pattern feature learning module to learn an invariant pattern feature, which cooperates with the equivariant geometric feature to enhance network expressiveness. We conduct experiments for the proposed model on four distinct scenarios: particle dynamics, molecule dynamics, human skeleton motion prediction and pedestrian trajectory prediction. Experimental results show that our method is not only generally applicable, but also achieves state-of-the-art prediction performances on all the four tasks, improving by 24.0/30.1/8.6/9.2%. Code is available at https://github.com/MediaBrain-SJTU/EqMotion.

* Accepted to CVPR 2023 
Viaarxiv icon

Collaboration Helps Camera Overtake LiDAR in 3D Detection

Mar 23, 2023
Yue Hu, Yifan Lu, Runsheng Xu, Weidi Xie, Siheng Chen, Yanfeng Wang

Figure 1 for Collaboration Helps Camera Overtake LiDAR in 3D Detection
Figure 2 for Collaboration Helps Camera Overtake LiDAR in 3D Detection
Figure 3 for Collaboration Helps Camera Overtake LiDAR in 3D Detection
Figure 4 for Collaboration Helps Camera Overtake LiDAR in 3D Detection

Camera-only 3D detection provides an economical solution with a simple configuration for localizing objects in 3D space compared to LiDAR-based detection systems. However, a major challenge lies in precise depth estimation due to the lack of direct 3D measurements in the input. Many previous methods attempt to improve depth estimation through network designs, e.g., deformable layers and larger receptive fields. This work proposes an orthogonal direction, improving the camera-only 3D detection by introducing multi-agent collaborations. Our proposed collaborative camera-only 3D detection (CoCa3D) enables agents to share complementary information with each other through communication. Meanwhile, we optimize communication efficiency by selecting the most informative cues. The shared messages from multiple viewpoints disambiguate the single-agent estimated depth and complement the occluded and long-range regions in the single-agent view. We evaluate CoCa3D in one real-world dataset and two new simulation datasets. Results show that CoCa3D improves previous SOTA performances by 44.21% on DAIR-V2X, 30.60% on OPV2V+, 12.59% on CoPerception-UAVs+ for AP@70. Our preliminary results show a potential that with sufficient collaboration, the camera might overtake LiDAR in some practical scenarios. We released the dataset and code at https://siheng-chen.github.io/dataset/CoPerception+ and https://github.com/MediaBrain-SJTU/CoCa3D.

* Accepted by CVPR23 
Viaarxiv icon