Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Thomas S. Huang

Laplacian Denoising Autoencoder

Mar 30, 2020

Jianbo Jiao, Linchao Bao, Yunchao Wei, Shengfeng He, Honghui Shi, Rynson Lau, Thomas S. Huang

Figure 1 for Laplacian Denoising Autoencoder

Figure 2 for Laplacian Denoising Autoencoder

Figure 3 for Laplacian Denoising Autoencoder

Figure 4 for Laplacian Denoising Autoencoder

Abstract:While deep neural networks have been shown to perform remarkably well in many machine learning tasks, labeling a large amount of ground truth data for supervised training is usually very costly to scale. Therefore, learning robust representations with unlabeled data is critical in relieving human effort and vital for many downstream tasks. Recent advances in unsupervised and self-supervised learning approaches for visual data have benefited greatly from domain knowledge. Here we are interested in a more generic unsupervised learning framework that can be easily generalized to other domains. In this paper, we propose to learn data representations with a novel type of denoising autoencoder, where the noisy input data is generated by corrupting latent clean data in the gradient domain. This can be naturally generalized to span multiple scales with a Laplacian pyramid representation of the input data. In this way, the agent learns more robust representations that exploit the underlying data structures across multiple scales. Experiments on several visual benchmarks demonstrate that better representations can be learned with the proposed approach, compared to its counterpart with single-scale corruption and other approaches. Furthermore, we also demonstrate that the learned representations perform well when transferring to other downstream vision tasks.

Via

Access Paper or Ask Questions

Deep Affinity Net: Instance Segmentation via Affinity

Mar 15, 2020

Xingqian Xu, Mang Tik Chiu, Thomas S. Huang, Honghui Shi

Figure 1 for Deep Affinity Net: Instance Segmentation via Affinity

Figure 2 for Deep Affinity Net: Instance Segmentation via Affinity

Figure 3 for Deep Affinity Net: Instance Segmentation via Affinity

Figure 4 for Deep Affinity Net: Instance Segmentation via Affinity

Abstract:Most of the modern instance segmentation approaches fall into two categories: region-based approaches in which object bounding boxes are detected first and later used in cropping and segmenting instances; and keypoint-based approaches in which individual instances are represented by a set of keypoints followed by a dense pixel clustering around those keypoints. Despite the maturity of these two paradigms, we would like to report an alternative affinity-based paradigm where instances are segmented based on densely predicted affinities and graph partitioning algorithms. Such affinity-based approaches indicate that high-level graph features other than regions or keypoints can be directly applied in the instance segmentation task. In this work, we propose Deep Affinity Net, an effective affinity-based approach accompanied with a new graph partitioning algorithm Cascade-GAEC. Without bells and whistles, our end-to-end model results in 32.4% AP on Cityscapes val and 27.5% AP on test. It achieves the best single-shot result as well as the fastest running time among all affinity-based models. It also outperforms the region-based method Mask R-CNN.

Via

Access Paper or Ask Questions

AlignSeg: Feature-Aligned Segmentation Networks

Feb 24, 2020

Zilong Huang, Yunchao Wei, Xinggang Wang, Honghui Shi, Wenyu Liu, Thomas S. Huang

Figure 1 for AlignSeg: Feature-Aligned Segmentation Networks

Figure 2 for AlignSeg: Feature-Aligned Segmentation Networks

Figure 3 for AlignSeg: Feature-Aligned Segmentation Networks

Figure 4 for AlignSeg: Feature-Aligned Segmentation Networks

Abstract:Aggregating features in terms of different convolutional blocks or contextual embeddings has been proven to be an effective way to strengthen feature representations for semantic segmentation. However, most of the current popular network architectures tend to ignore the misalignment issues during the feature aggregation process caused by 1) step-by-step downsampling operations, and 2) indiscriminate contextual information fusion. In this paper, we explore the principles in addressing such feature misalignment issues and inventively propose Feature-Aligned Segmentation Networks (AlignSeg). AlignSeg consists of two primary modules, i.e., the Aligned Feature Aggregation (AlignFA) module and the Aligned Context Modeling (AlignCM) module. First, AlignFA adopts a simple learnable interpolation strategy to learn transformation offsets of pixels, which can effectively relieve the feature misalignment issue caused by multiresolution feature aggregation. Second, with the contextual embeddings in hand, AlignCM enables each pixel to choose private custom contextual information in an adaptive manner, making the contextual embeddings aligned better to provide appropriate guidance. We validate the effectiveness of our AlignSeg network with extensive experiments on Cityscapes and ADE20K, achieving new state-of-the-art mIoU scores of 82.6% and 45.95%, respectively. Our source code will be made available.

* In Submission

Via

Access Paper or Ask Questions

Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis

Jan 05, 2020

Mang Tik Chiu, Xingqian Xu, Yunchao Wei, Zilong Huang, Alexander Schwing, Robert Brunner, Hrant Khachatrian, Hovnatan Karapetyan, Ivan Dozier, Greg Rose(+5 more)

Figure 1 for Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis

Figure 2 for Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis

Figure 3 for Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis

Figure 4 for Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis

Abstract:The success of deep learning in visual recognition tasks has driven advancements in multiple fields of research. Particularly, increasing attention has been drawn towards its application in agriculture. Nevertheless, while visual pattern recognition on farmlands carries enormous economic values, little progress has been made to merge computer vision and crop sciences due to the lack of suitable agricultural image datasets. Meanwhile, problems in agriculture also pose new challenges in computer vision. For example, semantic segmentation of aerial farmland images requires inference over extremely large-size images with extreme annotation sparsity. These challenges are not present in most of the common object datasets, and we show that they are more challenging than many other aerial image datasets. To encourage research in computer vision for agriculture, we present Agriculture-Vision: a large-scale aerial farmland image dataset for semantic segmentation of agricultural patterns. We collected 94,986 high-quality aerial images from 3,432 farmlands across the US, where each image consists of RGB and Near-infrared (NIR) channels with resolution as high as 10 cm per pixel. We annotate nine types of field anomaly patterns that are most important to farmers. As a pilot study of aerial agricultural semantic segmentation, we perform comprehensive experiments using popular semantic segmentation models; we also propose an effective model designed for aerial agricultural pattern recognition. Our experiments demonstrate several challenges Agriculture-Vision poses to both the computer vision and agriculture communities. Future versions of this dataset will include even more aerial images, anomaly patterns and image channels. More information at https://www.agriculture-vision.com.

Via

Access Paper or Ask Questions

Scale-wise Convolution for Image Restoration

Dec 19, 2019

Yuchen Fan, Jiahui Yu, Ding Liu, Thomas S. Huang

Figure 1 for Scale-wise Convolution for Image Restoration

Figure 2 for Scale-wise Convolution for Image Restoration

Figure 3 for Scale-wise Convolution for Image Restoration

Figure 4 for Scale-wise Convolution for Image Restoration

Abstract:While scale-invariant modeling has substantially boosted the performance of visual recognition tasks, it remains largely under-explored in deep networks based image restoration. Naively applying those scale-invariant techniques (e.g. multi-scale testing, random-scale data augmentation) to image restoration tasks usually leads to inferior performance. In this paper, we show that properly modeling scale-invariance into neural networks can bring significant benefits to image restoration performance. Inspired from spatial-wise convolution for shift-invariance, "scale-wise convolution" is proposed to convolve across multiple scales for scale-invariance. In our scale-wise convolutional network (SCN), we first map the input image to the feature space and then build a feature pyramid representation via bi-linear down-scaling progressively. The feature pyramid is then passed to a residual network with scale-wise convolutions. The proposed scale-wise convolution learns to dynamically activate and aggregate features from different input scales in each residual building block, in order to exploit contextual information on multiple scales. In experiments, we compare the restoration accuracy and parameter efficiency among our model and many different variants of multi-scale neural networks. The proposed network with scale-wise convolution achieves superior performance in multiple image restoration tasks including image super-resolution, image denoising and image compression artifacts removal. Code and models are available at: https://github.com/ychfan/scn_sr

* AAAI 2020

Via

Access Paper or Ask Questions

Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation

Dec 06, 2019

Bowen Cheng, Maxwell D. Collins, Yukun Zhu, Ting Liu, Thomas S. Huang, Hartwig Adam, Liang-Chieh Chen

Figure 1 for Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation

Figure 2 for Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation

Figure 3 for Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation

Figure 4 for Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation

Abstract:In this work, we introduce Panoptic-DeepLab, a simple, strong, and fast system for panoptic segmentation, aiming to establish a solid baseline for bottom-up methods that can achieve comparable performance of two-stage methods while yielding fast inference speed. In particular, PanopticDeepLab adopts the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation model (e.g., DeepLab), while the instance segmentation branch is class-agnostic, involving a simple instance center regression. As a result, our single Panoptic-DeepLab simultaneously ranks first at all three Cityscapes benchmarks, setting the new state-of-art of 84.2% mIoU, 39.0% AP, and 65.5% PQ on test set. Additionally, equipped with MobileNetV3, Panoptic-DeepLab runs nearly in real-time with a single 1025 x 2049 image (15.8 frames per second), while achieving a competitive performance on Cityscapes (54.1 PQ% on test set). On Mapillary Vistas test set, our ensemble of six models attains 42.7% PQ, outperforming the challenge winner in 2018 by a healthy margin of 1.5%. Finally, our Panoptic-DeepLab also performs on par with several topdown approaches on the challenging COCO dataset. For the first time, we demonstrate a bottom-up approach could deliver state-of-the-art results on panoptic segmentation.

* 16-page detailed technical report of arXiv:1910.04751

Via

Access Paper or Ask Questions

Any-Precision Deep Neural Networks

Nov 17, 2019

Haichao Yu, Haoxiang Li, Honghui Shi, Thomas S. Huang, Gang Hua

Figure 1 for Any-Precision Deep Neural Networks

Figure 2 for Any-Precision Deep Neural Networks

Figure 3 for Any-Precision Deep Neural Networks

Figure 4 for Any-Precision Deep Neural Networks

Abstract:We present Any-Precision Deep Neural Networks (Any-Precision DNNs), which are trained with a new method that empowers learned DNNs to be flexible in any numerical precision during inference. The same model in runtime can be flexibly and directly set to different bit-width, by truncating the least significant bits, to support dynamic speed and accuracy trade-off. When all layers are set to low-bits, we show that the model achieved accuracy comparable to dedicated models trained at the same precision. This nice property facilitates flexible deployment of deep learning models in real-world applications, where in practice trade-offs between model accuracy and runtime efficiency are often sought. Previous literature presents solutions to train models at each individual fixed efficiency/accuracy trade-off point. But how to produce a model flexible in runtime precision is largely unexplored. When the demand of efficiency/accuracy trade-off varies from time to time or even dynamically changes in runtime, it is infeasible to re-train models accordingly, and the storage budget may forbid keeping multiple models. Our proposed framework achieves this flexibility without performance degradation. More importantly, we demonstrate that this achievement is agnostic to model architectures. We experimentally validated our method with different deep network backbones (AlexNet-small, Resnet-20, Resnet-50) on different datasets (SVHN, Cifar-10, ImageNet) and observed consistent results. Code and models will be available at https://github.com/haichaoyu.

Via

Access Paper or Ask Questions

Panoptic-DeepLab

Oct 24, 2019

Bowen Cheng, Maxwell D. Collins, Yukun Zhu, Ting Liu, Thomas S. Huang, Hartwig Adam, Liang-Chieh Chen

Abstract:We present Panoptic-DeepLab, a bottom-up and single-shot approach for panoptic segmentation. Our Panoptic-DeepLab is conceptually simple and delivers state-of-the-art results. In particular, we adopt the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation model (e.g., DeepLab), while the instance segmentation branch is class-agnostic, involving a simple instance center regression. Our single Panoptic-DeepLab sets the new state-of-art at all three Cityscapes benchmarks, reaching 84.2% mIoU, 39.0% AP, and 65.5% PQ on test set, and advances results on the other challenging Mapillary Vistas.

* This work is presented at ICCV 2019 Joint COCO and Mapillary Recognition Challenge Workshop

Via

Access Paper or Ask Questions

One-to-one Mapping for Unpaired Image-to-image Translation

Oct 12, 2019

Zengming Shen, S. Kevin Zhou, Yifan Chen, Bogdan Georgescu, Xuqi Liu, Thomas S. Huang

Figure 1 for One-to-one Mapping for Unpaired Image-to-image Translation

Figure 2 for One-to-one Mapping for Unpaired Image-to-image Translation

Figure 3 for One-to-one Mapping for Unpaired Image-to-image Translation

Figure 4 for One-to-one Mapping for Unpaired Image-to-image Translation

Abstract:Recently image-to-image translation has attracted significant interests in the literature, starting from the successful use of the generative adversarial network (GAN), to the introduction of cyclic constraint, to extensions to multiple domains. However, in existing approaches, there is no guarantee that the mapping between two image domains is unique or one-to-one. Here we propose a self-inverse network learning approach for unpaired image-to-image translation. Building on top of CycleGAN, we learn a self-inverse function by simply augmenting the training samples by swapping inputs and outputs during training and with separated cycle consistency loss for each mapping direction. The outcome of such learning is a proven one-to-one mapping function. Our extensive experiments on a variety of datasets, including cross-modal medical image synthesis, object transfiguration, and semantic labeling, consistently demonstrate clear improvement over the CycleGAN method both qualitatively and quantitatively. Especially our proposed method reaches the state-of-the-art result on the cityscapes benchmark dataset for the label to photo unpaired directional image translation.

* 10 pages, 7 figures

Via

Access Paper or Ask Questions

Towards Learning a Self-inverse Network for Bidirectional Image-to-image Translation

Sep 16, 2019

Zengming Shen, Yifan Chen, S. Kevin Zhou, Bogdan Georgescu, Xuqi Liu, Thomas S. Huang

Figure 1 for Towards Learning a Self-inverse Network for Bidirectional Image-to-image Translation

Figure 2 for Towards Learning a Self-inverse Network for Bidirectional Image-to-image Translation

Figure 3 for Towards Learning a Self-inverse Network for Bidirectional Image-to-image Translation

Figure 4 for Towards Learning a Self-inverse Network for Bidirectional Image-to-image Translation

Abstract:The one-to-one mapping is necessary for many bidirectional image-to-image translation applications, such as MRI image synthesis as MRI images are unique to the patient. State-of-the-art approaches for image synthesis from domain X to domain Y learn a convolutional neural network that meticulously maps between the domains. A different network is typically implemented to map along the opposite direction, from Y to X. In this paper, we explore the possibility of only wielding one network for bi-directional image synthesis. In other words, such an autonomous learning network implements a self-inverse function. A self-inverse network shares several distinct advantages: only one network instead of two, better generalization and more restricted parameter space. Most importantly, a self-inverse function guarantees a one-to-one mapping, a property that cannot be guaranteed by earlier approaches that are not self-inverse. The experiments on three datasets show that, compared with the baseline approaches that use two separate models for the image synthesis along two directions, our self-inverse network achieves better synthesis results in terms of standard metrics. Finally, our sensitivity analysis confirms the feasibility of learning a self-inverse function for the bidirectional image translation.

* 10 pages, 9 figures

Via

Access Paper or Ask Questions