Alert button
Picture for Ziwei Zhao

Ziwei Zhao

Alert button

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Nov 30, 2023
Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei Huang, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). More than 800 participants from 13 cities worldwide performed these activities in 131 different natural scene contexts, yielding long-form captures from 1 to 42 minutes each and 1,422 hours of video combined. The multimodal nature of the dataset is unprecedented: the video is accompanied by multichannel audio, eye gaze, 3D point clouds, camera poses, IMU, and multiple paired language descriptions -- including a novel "expert commentary" done by coaches and teachers and tailored to the skilled-activity domain. To push the frontier of first-person video understanding of skilled human activity, we also present a suite of benchmark tasks and their annotations, including fine-grained activity understanding, proficiency estimation, cross-view translation, and 3D hand/body pose. All resources will be open sourced to fuel new research in the community.

Viaarxiv icon

Topology-Preserving Automatic Labeling of Coronary Arteries via Anatomy-aware Connection Classifier

Jul 22, 2023
Zhixing Zhang, Ziwei Zhao, Dong Wang, Shishuang Zhao, Yuhang Liu, Jia Liu, Liwei Wang

Figure 1 for Topology-Preserving Automatic Labeling of Coronary Arteries via Anatomy-aware Connection Classifier
Figure 2 for Topology-Preserving Automatic Labeling of Coronary Arteries via Anatomy-aware Connection Classifier
Figure 3 for Topology-Preserving Automatic Labeling of Coronary Arteries via Anatomy-aware Connection Classifier
Figure 4 for Topology-Preserving Automatic Labeling of Coronary Arteries via Anatomy-aware Connection Classifier

Automatic labeling of coronary arteries is an essential task in the practical diagnosis process of cardiovascular diseases. For experienced radiologists, the anatomically predetermined connections are important for labeling the artery segments accurately, while this prior knowledge is barely explored in previous studies. In this paper, we present a new framework called TopoLab which incorporates the anatomical connections into the network design explicitly. Specifically, the strategies of intra-segment feature aggregation and inter-segment feature interaction are introduced for hierarchical segment feature extraction. Moreover, we propose the anatomy-aware connection classifier to enable classification for each connected segment pair, which effectively exploits the prior topology among the arteries with different categories. To validate the effectiveness of our method, we contribute high-quality annotations of artery labeling to the public orCaScore dataset. The experimental results on both the orCaScore dataset and an in-house dataset show that our TopoLab has achieved state-of-the-art performance.

* Accepted by MICCAI 2023 
Viaarxiv icon

Mining Negative Temporal Contexts For False Positive Suppression In Real-Time Ultrasound Lesion Detection

May 29, 2023
Haojun Yu, Youcheng Li, QuanLin Wu, Ziwei Zhao, Dengbo Chen, Dong Wang, Liwei Wang

Figure 1 for Mining Negative Temporal Contexts For False Positive Suppression In Real-Time Ultrasound Lesion Detection
Figure 2 for Mining Negative Temporal Contexts For False Positive Suppression In Real-Time Ultrasound Lesion Detection
Figure 3 for Mining Negative Temporal Contexts For False Positive Suppression In Real-Time Ultrasound Lesion Detection
Figure 4 for Mining Negative Temporal Contexts For False Positive Suppression In Real-Time Ultrasound Lesion Detection

During ultrasonic scanning processes, real-time lesion detection can assist radiologists in accurate cancer diagnosis. However, this essential task remains challenging and underexplored. General-purpose real-time object detection models can mistakenly report obvious false positives (FPs) when applied to ultrasound videos, potentially misleading junior radiologists. One key issue is their failure to utilize negative symptoms in previous frames, denoted as negative temporal contexts (NTC). To address this issue, we propose to extract contexts from previous frames, including NTC, with the guidance of inverse optical flow. By aggregating extracted contexts, we endow the model with the ability to suppress FPs by leveraging NTC. We call the resulting model UltraDet. The proposed UltraDet demonstrates significant improvement over previous state-of-the-arts and achieves real-time inference speed. To facilitate future research, we will release the code, checkpoints, and high-quality labels of the CVA-BUS dataset used in our experiments.

* 10 pages, 4 figures, MICCAI 2023 Early Accept 
Viaarxiv icon

Efficient automatic segmentation for multi-level pulmonary arteries: The PARSE challenge

Apr 07, 2023
Gongning Luo, Kuanquan Wang, Jun Liu, Shuo Li, Xinjie Liang, Xiangyu Li, Shaowei Gan, Wei Wang, Suyu Dong, Wenyi Wang, Pengxin Yu, Enyou Liu, Hongrong Wei, Na Wang, Jia Guo, Huiqi Li, Zhao Zhang, Ziwei Zhao, Na Gao, Nan An, Ashkan Pakzad, Bojidar Rangelov, Jiaqi Dou, Song Tian, Zeyu Liu, Yi Wang, Ampatishan Sivalingam, Kumaradevan Punithakumar, Zhaowen Qiu, Xin Gao

Figure 1 for Efficient automatic segmentation for multi-level pulmonary arteries: The PARSE challenge
Figure 2 for Efficient automatic segmentation for multi-level pulmonary arteries: The PARSE challenge
Figure 3 for Efficient automatic segmentation for multi-level pulmonary arteries: The PARSE challenge
Figure 4 for Efficient automatic segmentation for multi-level pulmonary arteries: The PARSE challenge

Efficient automatic segmentation of multi-level (i.e. main and branch) pulmonary arteries (PA) in CTPA images plays a significant role in clinical applications. However, most existing methods concentrate only on main PA or branch PA segmentation separately and ignore segmentation efficiency. Besides, there is no public large-scale dataset focused on PA segmentation, which makes it highly challenging to compare the different methods. To benchmark multi-level PA segmentation algorithms, we organized the first \textbf{P}ulmonary \textbf{AR}tery \textbf{SE}gmentation (PARSE) challenge. On the one hand, we focus on both the main PA and the branch PA segmentation. On the other hand, for better clinical application, we assign the same score weight to segmentation efficiency (mainly running time and GPU memory consumption during inference) while ensuring PA segmentation accuracy. We present a summary of the top algorithms and offer some suggestions for efficient and accurate multi-level PA automatic segmentation. We provide the PARSE challenge as open-access for the community to benchmark future algorithm developments at \url{https://parse2022.grand-challenge.org/Parse2022/}.

Viaarxiv icon

Check and Link: Pairwise Lesion Correspondence Guides Mammogram Mass Detection

Sep 13, 2022
Ziwei Zhao, Dong Wang, Yihong Chen, Ziteng Wang, Liwei Wang

Figure 1 for Check and Link: Pairwise Lesion Correspondence Guides Mammogram Mass Detection
Figure 2 for Check and Link: Pairwise Lesion Correspondence Guides Mammogram Mass Detection
Figure 3 for Check and Link: Pairwise Lesion Correspondence Guides Mammogram Mass Detection
Figure 4 for Check and Link: Pairwise Lesion Correspondence Guides Mammogram Mass Detection

Detecting mass in mammogram is significant due to the high occurrence and mortality of breast cancer. In mammogram mass detection, modeling pairwise lesion correspondence explicitly is particularly important. However, most of the existing methods build relatively coarse correspondence and have not utilized correspondence supervision. In this paper, we propose a new transformer-based framework CL-Net to learn lesion detection and pairwise correspondence in an end-to-end manner. In CL-Net, View-Interactive Lesion Detector is proposed to achieve dynamic interaction across candidates of cross views, while Lesion Linker employs the correspondence supervision to guide the interaction process more accurately. The combination of these two designs accomplishes precise understanding of pairwise lesion correspondence for mammograms. Experiments show that CL-Net yields state-of-the-art performance on the public DDSM dataset and our in-house dataset. Moreover, it outperforms previous methods by a large margin in low FPI regime.

* Accepted by ECCV 2022 
Viaarxiv icon

PointScatter: Point Set Representation for Tubular Structure Extraction

Sep 13, 2022
Dong Wang, Zhao Zhang, Ziwei Zhao, Yuhang Liu, Yihong Chen, Liwei Wang

Figure 1 for PointScatter: Point Set Representation for Tubular Structure Extraction
Figure 2 for PointScatter: Point Set Representation for Tubular Structure Extraction
Figure 3 for PointScatter: Point Set Representation for Tubular Structure Extraction
Figure 4 for PointScatter: Point Set Representation for Tubular Structure Extraction

This paper explores the point set representation for tubular structure extraction tasks. Compared with the traditional mask representation, the point set representation enjoys its flexibility and representation ability, which would not be restricted by the fixed grid as the mask. Inspired by this, we propose PointScatter, an alternative to the segmentation models for the tubular structure extraction task. PointScatter splits the image into scatter regions and parallelly predicts points for each scatter region. We further propose the greedy-based region-wise bipartite matching algorithm to train the network end-to-end and efficiently. We benchmark the PointScatter on four public tubular datasets, and the extensive experiments on tubular structure segmentation and centerline extraction task demonstrate the effectiveness of our approach. Code is available at https://github.com/zhangzhao2022/pointscatter.

* ECCV2022 (Oral) 
Viaarxiv icon

Applying the Case Difference Heuristic to Learn Adaptations from Deep Network Features

Jul 15, 2021
Xiaomeng Ye, Ziwei Zhao, David Leake, Xizi Wang, David Crandall

Figure 1 for Applying the Case Difference Heuristic to Learn Adaptations from Deep Network Features
Figure 2 for Applying the Case Difference Heuristic to Learn Adaptations from Deep Network Features
Figure 3 for Applying the Case Difference Heuristic to Learn Adaptations from Deep Network Features

The case difference heuristic (CDH) approach is a knowledge-light method for learning case adaptation knowledge from the case base of a case-based reasoning system. Given a pair of cases, the CDH approach attributes the difference in their solutions to the difference in the problems they solve, and generates adaptation rules to adjust solutions accordingly when a retrieved case and new query have similar problem differences. As an alternative to learning adaptation rules, several researchers have applied neural networks to learn to predict solution differences from problem differences. Previous work on such approaches has assumed that the feature set describing problems is predefined. This paper investigates a two-phase process combining deep learning for feature extraction and neural network based adaptation learning from extracted features. Its performance is demonstrated in a regression task on an image data: predicting age given the image of a face. Results show that the combined process can successfully learn adaptation knowledge applicable to nonsymbolic differences in cases. The CBR system achieves slightly lower performance overall than a baseline deep network regressor, but better performance than the baseline on novel queries.

* 7 pages, 2 figures, 1 table. To be published in the IJCAI-21 Workshop on Deep Learning, Case-Based Reasoning, and AutoML: Present and Future Synergies 
Viaarxiv icon

FPGA-based Binocular Image Feature Extraction and Matching System

May 14, 2019
Qi Ni, Fei Wang, Ziwei Zhao, Peng Gao

Figure 1 for FPGA-based Binocular Image Feature Extraction and Matching System
Figure 2 for FPGA-based Binocular Image Feature Extraction and Matching System
Figure 3 for FPGA-based Binocular Image Feature Extraction and Matching System
Figure 4 for FPGA-based Binocular Image Feature Extraction and Matching System

Image feature extraction and matching is a fundamental but computation intensive task in machine vision. This paper proposes a novel FPGA-based embedded system to accelerate feature extraction and matching. It implements SURF feature point detection and BRIEF feature descriptor construction and matching. For binocular stereo vision, feature matching includes both tracking matching and stereo matching, which simultaneously provide feature point correspondences and parallax information. Our system is evaluated on a ZYNQ XC7Z045 FPGA. The result demonstrates that it can process binocular video data at a high frame rate (640$\times$480 @ 162fps). Moreover, an extensive test proves our system has robustness for image compression, blurring and illumination.

* Accepted for the 4th International Conference on Multimedia Systems and Signal Processing (ICMSSP 2019) 
Viaarxiv icon