Alert button
Picture for Thomas S. Huang

Thomas S. Huang

Alert button

Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection

Apr 29, 2021
Jiachen Li, Bowen Cheng, Rogerio Feris, Jinjun Xiong, Thomas S. Huang, Wen-Mei Hwu, Humphrey Shi

Figure 1 for Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection
Figure 2 for Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection
Figure 3 for Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection
Figure 4 for Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection

Current anchor-free object detectors are quite simple and effective yet lack accurate label assignment methods, which limits their potential in competing with classic anchor-based models that are supported by well-designed assignment methods based on the Intersection-over-Union~(IoU) metric. In this paper, we present \textbf{Pseudo-Intersection-over-Union~(Pseudo-IoU)}: a simple metric that brings more standardized and accurate assignment rule into anchor-free object detection frameworks without any additional computational cost or extra parameters for training and testing, making it possible to further improve anchor-free object detection by utilizing training samples of good quality under effective assignment rules that have been previously applied in anchor-based methods. By incorporating Pseudo-IoU metric into an end-to-end single-stage anchor-free object detection framework, we observe consistent improvements in their performance on general object detection benchmarks such as PASCAL VOC and MSCOCO. Our method (single-model and single-scale) also achieves comparable performance to other recent state-of-the-art anchor-free methods without bells and whistles. Our code is based on mmdetection toolbox and will be made publicly available at https://github.com/SHI-Labs/Pseudo-IoU-for-Anchor-Free-Object-Detection.

* CVPR 2021 Workshop 
Viaarxiv icon

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Dec 07, 2020
Yang Fu, Linjie Yang, Ding Liu, Thomas S. Huang, Humphrey Shi

Figure 1 for CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation
Figure 2 for CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation
Figure 3 for CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation
Figure 4 for CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video. Previous approaches only utilize single-frame features for the detection, segmentation, and tracking of objects and they suffer in the video scenario due to several distinct challenges such as motion blur and drastic appearance change. To eliminate ambiguities introduced by only using single-frame features, we propose a novel comprehensive feature aggregation approach (CompFeat) to refine features at both frame-level and object-level with temporal and spatial context information. The aggregation process is carefully designed with a new attention mechanism which significantly increases the discriminative power of the learned features. We further improve the tracking capability of our model through a siamese design by incorporating both feature similarities and spatial similarities. Experiments conducted on the YouTube-VIS dataset validate the effectiveness of proposed CompFeat. Our code will be available at https://github.com/SHI-Labs/CompFeat-for-Video-Instance-Segmentation.

* Accepted to AAAI 2021 
Viaarxiv icon

Motion Pyramid Networks for Accurate and Efficient Cardiac Motion Estimation

Jun 28, 2020
Hanchao Yu, Xiao Chen, Humphrey Shi, Terrence Chen, Thomas S. Huang, Shanhui Sun

Figure 1 for Motion Pyramid Networks for Accurate and Efficient Cardiac Motion Estimation
Figure 2 for Motion Pyramid Networks for Accurate and Efficient Cardiac Motion Estimation
Figure 3 for Motion Pyramid Networks for Accurate and Efficient Cardiac Motion Estimation
Figure 4 for Motion Pyramid Networks for Accurate and Efficient Cardiac Motion Estimation

Cardiac motion estimation plays a key role in MRI cardiac feature tracking and function assessment such as myocardium strain. In this paper, we propose Motion Pyramid Networks, a novel deep learning-based approach for accurate and efficient cardiac motion estimation. We predict and fuse a pyramid of motion fields from multiple scales of feature representations to generate a more refined motion field. We then use a novel cyclic teacher-student training strategy to make the inference end-to-end and further improve the tracking performance. Our teacher model provides more accurate motion estimation as supervision through progressive motion compensations. Our student model learns from the teacher model to estimate motion in a single step while maintaining accuracy. The teacher-student knowledge distillation is performed in a cyclic way for a further performance boost. Our proposed method outperforms a strong baseline model on two public available clinical datasets significantly, evaluated by a variety of metrics and the inference time. New evaluation metrics are also proposed to represent errors in a clinically meaningful manner.

* Accepted by MICCAI2020 
Viaarxiv icon

Neural Sparse Representation for Image Restoration

Jun 08, 2020
Yuchen Fan, Jiahui Yu, Yiqun Mei, Yulun Zhang, Yun Fu, Ding Liu, Thomas S. Huang

Figure 1 for Neural Sparse Representation for Image Restoration
Figure 2 for Neural Sparse Representation for Image Restoration
Figure 3 for Neural Sparse Representation for Image Restoration
Figure 4 for Neural Sparse Representation for Image Restoration

Inspired by the robustness and efficiency of sparse representation in sparse coding based image restoration models, we investigate the sparsity of neurons in deep networks. Our method structurally enforces sparsity constraints upon hidden neurons. The sparsity constraints are favorable for gradient-based learning algorithms and attachable to convolution layers in various networks. Sparsity in neurons enables computation saving by only operating on non-zero components without hurting accuracy. Meanwhile, our method can magnify representation dimensionality and model capacity with negligible additional computation cost. Experiments show that sparse representation is crucial in deep neural networks for multiple image restoration tasks, including image super-resolution, image denoising, and image compression artifacts removal. Code is available at https://github.com/ychfan/nsr

Viaarxiv icon

Pyramid Attention Networks for Image Restoration

Jun 03, 2020
Yiqun Mei, Yuchen Fan, Yulun Zhang, Jiahui Yu, Yuqian Zhou, Ding Liu, Yun Fu, Thomas S. Huang, Humphrey Shi

Figure 1 for Pyramid Attention Networks for Image Restoration
Figure 2 for Pyramid Attention Networks for Image Restoration
Figure 3 for Pyramid Attention Networks for Image Restoration
Figure 4 for Pyramid Attention Networks for Image Restoration

Self-similarity refers to the image prior widely used in image restoration algorithms that small but similar patterns tend to occur at different locations and scales. However, recent advanced deep convolutional neural network based methods for image restoration do not take full advantage of self-similarities by relying on self-attention neural modules that only process information at the same scale. To solve this problem, we present a novel Pyramid Attention module for image restoration, which captures long-range feature correspondences from a multi-scale feature pyramid. Inspired by the fact that corruptions, such as noise or compression artifacts, drop drastically at coarser image scales, our attention module is designed to be able to borrow clean signals from their "clean" correspondences at the coarser levels. The proposed pyramid attention module is a generic building block that can be flexibly integrated into various neural architectures. Its effectiveness is validated through extensive experiments on multiple image restoration tasks: image denoising, demosaicing, compression artifact reduction, and super resolution. Without any bells and whistles, our PANet (pyramid attention module with simple network backbones) can produce state-of-the-art results with superior accuracy and visual quality. Our code will be available at https://github.com/SHI-Labs/Pyramid-Attention-Networks

Viaarxiv icon

Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining

Jun 02, 2020
Yiqun Mei, Yuchen Fan, Yuqian Zhou, Lichao Huang, Thomas S. Huang, Humphrey Shi

Figure 1 for Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining
Figure 2 for Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining
Figure 3 for Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining
Figure 4 for Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining

Deep convolution-based single image super-resolution (SISR) networks embrace the benefits of learning from large-scale external image resources for local recovery, yet most existing works have ignored the long-range feature-wise similarities in natural images. Some recent works have successfully leveraged this intrinsic feature correlation by exploring non-local attention modules. However, none of the current deep models have studied another inherent property of images: cross-scale feature correlation. In this paper, we propose the first Cross-Scale Non-Local (CS-NL) attention module with integration into a recurrent neural network. By combining the new CS-NL prior with local and in-scale non-local priors in a powerful recurrent fusion cell, we can find more cross-scale feature correlations within a single low-resolution (LR) image. The performance of SISR is significantly improved by exhaustively integrating all possible priors. Extensive experiments demonstrate the effectiveness of the proposed CS-NL module by setting new state-of-the-arts on multiple SISR benchmarks.

* CVPR2020 
Viaarxiv icon

NTIRE 2020 Challenge on Perceptual Extreme Super-Resolution: Methods and Results

May 03, 2020
Kai Zhang, Shuhang Gu, Radu Timofte, Taizhang Shang, Qiuju Dai, Shengchen Zhu, Tong Yang, Yandong Guo, Younghyun Jo, Sejong Yang, Seon Joo Kim, Lin Zha, Jiande Jiang, Xinbo Gao, Wen Lu, Jing Liu, Kwangjin Yoon, Taegyun Jeon, Kazutoshi Akita, Takeru Ooba, Norimichi Ukita, Zhipeng Luo, Yuehan Yao, Zhenyu Xu, Dongliang He, Wenhao Wu, Yukang Ding, Chao Li, Fu Li, Shilei Wen, Jianwei Li, Fuzhi Yang, Huan Yang, Jianlong Fu, Byung-Hoon Kim, JaeHyun Baek, Jong Chul Ye, Yuchen Fan, Thomas S. Huang, Junyeop Lee, Bokyeung Lee, Jungki Min, Gwantae Kim, Kanghyu Lee, Jaihyun Park, Mykola Mykhailych, Haoyu Zhong, Yukai Shi, Xiaojun Yang, Zhijing Yang, Liang Lin, Tongtong Zhao, Jinjia Peng, Huibing Wang, Zhi Jin, Jiahao Wu, Yifu Chen, Chenming Shang, Huanrong Zhang, Jeongki Min, Hrishikesh P S, Densen Puthussery, Jiji C V

Figure 1 for NTIRE 2020 Challenge on Perceptual Extreme Super-Resolution: Methods and Results
Figure 2 for NTIRE 2020 Challenge on Perceptual Extreme Super-Resolution: Methods and Results
Figure 3 for NTIRE 2020 Challenge on Perceptual Extreme Super-Resolution: Methods and Results
Figure 4 for NTIRE 2020 Challenge on Perceptual Extreme Super-Resolution: Methods and Results

This paper reviews the NTIRE 2020 challenge on perceptual extreme super-resolution with focus on proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor 16 based on a set of prior examples of low and corresponding high resolution images. The goal is to obtain a network design capable to produce high resolution results with the best perceptual quality and similar to the ground truth. The track had 280 registered participants, and 19 teams submitted the final results. They gauge the state-of-the-art in single image super-resolution.

* CVPRW 2020 
Viaarxiv icon

The 1st Agriculture-Vision Challenge: Methods and Results

Apr 23, 2020
Mang Tik Chiu, Xingqian Xu, Kai Wang, Jennifer Hobbs, Naira Hovakimyan, Thomas S. Huang, Honghui Shi, Yunchao Wei, Zilong Huang, Alexander Schwing, Robert Brunner, Ivan Dozier, Wyatt Dozier, Karen Ghandilyan, David Wilson, Hyunseong Park, Junhee Kim, Sungho Kim, Qinghui Liu, Michael C. Kampffmeyer, Robert Jenssen, Arnt B. Salberg, Alexandre Barbosa, Rodrigo Trevisan, Bingchen Zhao, Shaozuo Yu, Siwei Yang, Yin Wang, Hao Sheng, Xiao Chen, Jingyi Su, Ram Rajagopal, Andrew Ng, Van Thong Huynh, Soo-Hyung Kim, In-Seop Na, Ujjwal Baid, Shubham Innani, Prasad Dutande, Bhakti Baheti, Sanjay Talbar, Jianyu Tang

Figure 1 for The 1st Agriculture-Vision Challenge: Methods and Results
Figure 2 for The 1st Agriculture-Vision Challenge: Methods and Results
Figure 3 for The 1st Agriculture-Vision Challenge: Methods and Results
Figure 4 for The 1st Agriculture-Vision Challenge: Methods and Results

The first Agriculture-Vision Challenge aims to encourage research in developing novel and effective algorithms for agricultural pattern recognition from aerial images, especially for the semantic segmentation task associated with our challenge dataset. Around 57 participating teams from various countries compete to achieve state-of-the-art in aerial agriculture semantic segmentation. The Agriculture-Vision Challenge Dataset was employed, which comprises of 21,061 aerial and multi-spectral farmland images. This paper provides a summary of notable methods and results in the challenge. Our submission server and leaderboard will continue to open for researchers that are interested in this challenge dataset and task; the link can be found here.

* CVPR 2020 Workshop 
Viaarxiv icon