Alert button
Picture for Ziyang Song

Ziyang Song

Alert button

The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation

Jul 27, 2023
Lingdong Kong, Yaru Niu, Shaoyuan Xie, Hanjiang Hu, Lai Xing Ng, Benoit R. Cottereau, Ding Zhao, Liangjun Zhang, Hesheng Wang, Wei Tsang Ooi, Ruijie Zhu, Ziyang Song, Li Liu, Tianzhu Zhang, Jun Yu, Mohan Jing, Pengwei Li, Xiaohua Qi, Cheng Jin, Yingfeng Chen, Jie Hou, Jie Zhang, Zhen Kan, Qiang Ling, Liang Peng, Minglei Li, Di Xu, Changpeng Yang, Yuanqi Yao, Gang Wu, Jian Kuai, Xianming Liu, Junjun Jiang, Jiamian Huang, Baojun Li, Jiale Chen, Shuang Zhang, Sun Ao, Zhenyu Li, Runze Chen, Haiyong Luo, Fang Zhao, Jingze Yu

Figure 1 for The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation
Figure 2 for The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation
Figure 3 for The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation
Figure 4 for The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation

Accurate depth estimation under out-of-distribution (OoD) scenarios, such as adverse weather conditions, sensor failure, and noise contamination, is desirable for safety-critical applications. Existing depth estimation systems, however, suffer inevitably from real-world corruptions and perturbations and are struggled to provide reliable depth predictions under such cases. In this paper, we summarize the winning solutions from the RoboDepth Challenge -- an academic competition designed to facilitate and advance robust OoD depth estimation. This challenge was developed based on the newly established KITTI-C and NYUDepth2-C benchmarks. We hosted two stand-alone tracks, with an emphasis on robust self-supervised and robust fully-supervised depth estimation, respectively. Out of more than two hundred participants, nine unique and top-performing solutions have appeared, with novel designs ranging from the following aspects: spatial- and frequency-domain augmentations, masked image modeling, image restoration and super-resolution, adversarial training, diffusion-based noise suppression, vision-language pre-training, learned model ensembling, and hierarchical feature enhancement. Extensive experimental analyses along with insightful observations are drawn to better understand the rationale behind each design. We hope this challenge could lay a solid foundation for future research on robust and reliable depth estimation and beyond. The datasets, competition toolkit, workshop recordings, and source code from the winning teams are publicly available on the challenge website.

* Technical Report; 65 pages, 34 figures, 24 tables; Code at https://github.com/ldkong1205/RoboDepth 
Viaarxiv icon

Performance Analysis and Optimal Design of HARQ-IR-Aided Terahertz Communications

Apr 22, 2023
Ziyang Song, Zheng Shi, Jiaji Su, Qingping Dou, Guanghua Yang, Haichuan Ding, Shaodan Ma

Figure 1 for Performance Analysis and Optimal Design of HARQ-IR-Aided Terahertz Communications
Figure 2 for Performance Analysis and Optimal Design of HARQ-IR-Aided Terahertz Communications
Figure 3 for Performance Analysis and Optimal Design of HARQ-IR-Aided Terahertz Communications
Figure 4 for Performance Analysis and Optimal Design of HARQ-IR-Aided Terahertz Communications

Terahertz (THz) communications are envisioned to be a promising technology for 6G thanks to its broad bandwidth. However, the large path loss, antenna misalignment, and atmospheric influence of THz communications severely deteriorate its reliability. To address this, hybrid automatic repeat request (HARQ) is recognized as an effective technique to ensure reliable THz communications. This paper delves into the performance analysis of HARQ with incremental redundancy (HARQ-IR)-aided THz communications in the presence/absence of blockage. More specifically, the analytical expression of the outage probability of HARQ-IR-aided THz communications is derived, with which the asymptotic outage analysis is enabled to gain meaningful insights, including diversity order, power allocation gain, modulation and coding gain, etc. Then the long term average throughput (LTAT) is expressed in terms of the outage probability based on renewal theory. Moreover, to combat the blockage effects, a multi-hop HARQ-IR-aided THz communication scheme is proposed and its performance is examined. To demonstrate the superiority of the proposed scheme, the other two HARQ-aided schemes, i.e., Type-I HARQ and HARQ with chase combining (HARQ-CC), are used for benchmarking in the simulations. In addition, a deep neural network (DNN) based outage evaluation framework with low computational complexity is devised to reap the benefits of using both asymptotic and simulation results in low and high outage regimes, respectively. This novel outage evaluation framework is finally employed for the optimal rate selection, which outperforms the asymptotic based optimization.

* Blockage, hybrid automatic repeat request (HARQ), outage probability, terahertz (THz) communications 
Viaarxiv icon

OGC: Unsupervised 3D Object Segmentation from Rigid Dynamics of Point Clouds

Oct 10, 2022
Ziyang Song, Bo Yang

Figure 1 for OGC: Unsupervised 3D Object Segmentation from Rigid Dynamics of Point Clouds
Figure 2 for OGC: Unsupervised 3D Object Segmentation from Rigid Dynamics of Point Clouds
Figure 3 for OGC: Unsupervised 3D Object Segmentation from Rigid Dynamics of Point Clouds
Figure 4 for OGC: Unsupervised 3D Object Segmentation from Rigid Dynamics of Point Clouds

In this paper, we study the problem of 3D object segmentation from raw point clouds. Unlike all existing methods which usually require a large amount of human annotations for full supervision, we propose the first unsupervised method, called OGC, to simultaneously identify multiple 3D objects in a single forward pass, without needing any type of human annotations. The key to our approach is to fully leverage the dynamic motion patterns over sequential point clouds as supervision signals to automatically discover rigid objects. Our method consists of three major components, 1) the object segmentation network to directly estimate multi-object masks from a single point cloud frame, 2) the auxiliary self-supervised scene flow estimator, and 3) our core object geometry consistency component. By carefully designing a series of loss functions, we effectively take into account the multi-object rigid consistency and the object shape invariance in both temporal and spatial scales. This allows our method to truly discover the object geometry even in the absence of annotations. We extensively evaluate our method on five datasets, demonstrating the superior performance for object part instance segmentation and general object segmentation in both indoor and the challenging outdoor scenarios.

* NeurIPS 2022. Code and data are available at: https://github.com/vLAR-group/OGC 
Viaarxiv icon

Outage Probability Analysis of HARQ-Aided Terahertz Communications

Sep 24, 2022
Ziyang Song, Zheng Shi, Qingping Dou, Guanghua Yang, Yunfei Li, Shaodan Ma

Figure 1 for Outage Probability Analysis of HARQ-Aided Terahertz Communications
Figure 2 for Outage Probability Analysis of HARQ-Aided Terahertz Communications

Although terahertz (THz) communications can provide mobile broadband services, it usually has a large path loss and is vulnerable to antenna misalignment. This significantly degrades the reception reliability. To address this issue, the hybrid automatic repeat request (HARQ) is proposed to further enhance the reliability of THz communications. This paper provides an in-depth investigation on the outage performance of two different types of HARQ-aided THz communications, including Type-I HARQ and HARQ with chase combining (HARQ-CC). Moreover, the effects of both fading and stochastic antenna misalignment are considered in this paper. The exact outage probabilities of HARQ-aided THz communications are derived in closed-form, with which the asymptotic outage analysis is enabled to explore helpful insights. In particular, it is revealed that full time diversity can be achieved by using HARQ assisted schemes. Besides, the HARQ-CC-aided scheme performs better than the Type-I HARQ-aided one due to its high diversity combining gain. The analytical results are eventually validated via Monte-Carlo simulations.

Viaarxiv icon

ActFormer: A GAN Transformer Framework towards General Action-Conditioned 3D Human Motion Generation

Mar 15, 2022
Ziyang Song, Dongliang Wang, Nan Jiang, Zhicheng Fang, Chenjing Ding, Weihao Gan, Wei Wu

Figure 1 for ActFormer: A GAN Transformer Framework towards General Action-Conditioned 3D Human Motion Generation
Figure 2 for ActFormer: A GAN Transformer Framework towards General Action-Conditioned 3D Human Motion Generation
Figure 3 for ActFormer: A GAN Transformer Framework towards General Action-Conditioned 3D Human Motion Generation
Figure 4 for ActFormer: A GAN Transformer Framework towards General Action-Conditioned 3D Human Motion Generation

We present a GAN Transformer framework for general action-conditioned 3D human motion generation, including not only single-person actions but also multi-person interactive actions. Our approach consists of a powerful Action-conditioned motion transFormer (ActFormer) under a GAN training scheme, equipped with a Gaussian Process latent prior. Such a design combines the strong spatio-temporal representation capacity of Transformer, superiority in generative modeling of GAN, and inherent temporal correlations from latent prior. Furthermore, ActFormer can be naturally extended to multi-person motions by alternately modeling temporal correlations and human interactions with Transformer encoders. We validate our approach by comparison with other methods on larger-scale benchmarks, including NTU RGB+D 120 and BABEL. We also introduce a new synthetic dataset of complex multi-person combat behaviors to facilitate research on multi-person motion generation. Our method demonstrates adaptability to various human motion representations and achieves leading performance over SOTA methods on both single-person and multi-person motion generation tasks, indicating a hopeful step towards a universal human motion generator.

Viaarxiv icon

Multi-scale Matching Networks for Semantic Correspondence

Jul 31, 2021
Dongyang Zhao, Ziyang Song, Zhenghao Ji, Gangming Zhao, Weifeng Ge, Yizhou Yu

Figure 1 for Multi-scale Matching Networks for Semantic Correspondence
Figure 2 for Multi-scale Matching Networks for Semantic Correspondence
Figure 3 for Multi-scale Matching Networks for Semantic Correspondence
Figure 4 for Multi-scale Matching Networks for Semantic Correspondence

Deep features have been proven powerful in building accurate dense semantic correspondences in various previous works. However, the multi-scale and pyramidal hierarchy of convolutional neural networks has not been well studied to learn discriminative pixel-level features for semantic correspondence. In this paper, we propose a multi-scale matching network that is sensitive to tiny semantic differences between neighboring pixels. We follow the coarse-to-fine matching strategy and build a top-down feature and matching enhancement scheme that is coupled with the multi-scale hierarchy of deep convolutional neural networks. During feature enhancement, intra-scale enhancement fuses same-resolution feature maps from multiple layers together via local self-attention and cross-scale enhancement hallucinates higher-resolution feature maps along the top-down hierarchy. Besides, we learn complementary matching details at different scales thus the overall matching score is refined by features of different semantic levels gradually. Our multi-scale matching network can be trained end-to-end easily with few additional learnable parameters. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on three popular benchmarks with high computational efficiency.

* Accepted to appear in ICCV 2021 
Viaarxiv icon

Supervised multi-specialist topic model with applications on large-scale electronic health record data

May 04, 2021
Ziyang Song, Xavier Sumba Toral, Yixin Xu, Aihua Liu, Liming Guo, Guido Powell, Aman Verma, David Buckeridge, Ariane Marelli, Yue Li

Figure 1 for Supervised multi-specialist topic model with applications on large-scale electronic health record data
Figure 2 for Supervised multi-specialist topic model with applications on large-scale electronic health record data
Figure 3 for Supervised multi-specialist topic model with applications on large-scale electronic health record data
Figure 4 for Supervised multi-specialist topic model with applications on large-scale electronic health record data

Motivation: Electronic health record (EHR) data provides a new venue to elucidate disease comorbidities and latent phenotypes for precision medicine. To fully exploit its potential, a realistic data generative process of the EHR data needs to be modelled. We present MixEHR-S to jointly infer specialist-disease topics from the EHR data. As the key contribution, we model the specialist assignments and ICD-coded diagnoses as the latent topics based on patient's underlying disease topic mixture in a novel unified supervised hierarchical Bayesian topic model. For efficient inference, we developed a closed-form collapsed variational inference algorithm to learn the model distributions of MixEHR-S. We applied MixEHR-S to two independent large-scale EHR databases in Quebec with three targeted applications: (1) Congenital Heart Disease (CHD) diagnostic prediction among 154,775 patients; (2) Chronic obstructive pulmonary disease (COPD) diagnostic prediction among 73,791 patients; (3) future insulin treatment prediction among 78,712 patients diagnosed with diabetes as a mean to assess the disease exacerbation. In all three applications, MixEHR-S conferred clinically meaningful latent topics among the most predictive latent topics and achieved superior target prediction accuracy compared to the existing methods, providing opportunities for prioritizing high-risk patients for healthcare services. MixEHR-S source code and scripts of the experiments are freely available at https://github.com/li-lab-mcgill/mixehrS

Viaarxiv icon

Learning End-to-End Action Interaction by Paired-Embedding Data Augmentation

Jul 16, 2020
Ziyang Song, Zejian Yuan, Chong Zhang, Wanchao Chi, Yonggen Ling, Shenghao Zhang

Figure 1 for Learning End-to-End Action Interaction by Paired-Embedding Data Augmentation
Figure 2 for Learning End-to-End Action Interaction by Paired-Embedding Data Augmentation
Figure 3 for Learning End-to-End Action Interaction by Paired-Embedding Data Augmentation
Figure 4 for Learning End-to-End Action Interaction by Paired-Embedding Data Augmentation

In recognition-based action interaction, robots' responses to human actions are often pre-designed according to recognized categories and thus stiff. In this paper, we specify a new Interactive Action Translation (IAT) task which aims to learn end-to-end action interaction from unlabeled interactive pairs, removing explicit action recognition. To enable learning on small-scale data, we propose a Paired-Embedding (PE) method for effective and reliable data augmentation. Specifically, our method first utilizes paired relationships to cluster individual actions in an embedding space. Then two actions originally paired can be replaced with other actions in their respective neighborhood, assembling into new pairs. An Act2Act network based on conditional GAN follows to learn from augmented data. Besides, IAT-test and IAT-train scores are specifically proposed for evaluating methods on our task. Experimental results on two datasets show impressive effects and broad application prospects of our method.

* 16 pages, 7 figures 
Viaarxiv icon

Attention-Oriented Action Recognition for Real-Time Human-Robot Interaction

Jul 02, 2020
Ziyang Song, Ziyi Yin, Zejian Yuan, Chong Zhang, Wanchao Chi, Yonggen Ling, Shenghao Zhang

Figure 1 for Attention-Oriented Action Recognition for Real-Time Human-Robot Interaction
Figure 2 for Attention-Oriented Action Recognition for Real-Time Human-Robot Interaction
Figure 3 for Attention-Oriented Action Recognition for Real-Time Human-Robot Interaction
Figure 4 for Attention-Oriented Action Recognition for Real-Time Human-Robot Interaction

Despite the notable progress made in action recognition tasks, not much work has been done in action recognition specifically for human-robot interaction. In this paper, we deeply explore the characteristics of the action recognition task in interaction scenarios and propose an attention-oriented multi-level network framework to meet the need for real-time interaction. Specifically, a Pre-Attention network is employed to roughly focus on the interactor in the scene at low resolution firstly and then perform fine-grained pose estimation at high resolution. The other compact CNN receives the extracted skeleton sequence as input for action recognition, utilizing attention-like mechanisms to capture local spatial-temporal patterns and global semantic information effectively. To evaluate our approach, we construct a new action dataset specially for the recognition task in interaction scenarios. Experimental results on our dataset and high efficiency (112 fps at 640 x 480 RGBD) on the mobile computing platform (Nvidia Jetson AGX Xavier) demonstrate excellent applicability of our method on action recognition in real-time human-robot interaction.

* 8 pages, 8 figures 
Viaarxiv icon