Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhiding Yu

CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs

Mar 30, 2022

Jiteng Mu, Shalini De Mello, Zhiding Yu, Nuno Vasconcelos, Xiaolong Wang, Jan Kautz, Sifei Liu

Figure 1 for CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs

Figure 2 for CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs

Figure 3 for CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs

Figure 4 for CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs

Abstract:Recent advances show that Generative Adversarial Networks (GANs) can synthesize images with smooth variations along semantically meaningful latent directions, such as pose, expression, layout, etc. While this indicates that GANs implicitly learn pixel-level correspondences across images, few studies explored how to extract them explicitly. In this work, we introduce Coordinate GAN (CoordGAN), a structure-texture disentangled GAN that learns a dense correspondence map for each generated image. We represent the correspondence maps of different images as warped coordinate frames transformed from a canonical coordinate frame, i.e., the correspondence map, which describes the structure (e.g., the shape of a face), is controlled via a transformation. Hence, finding correspondences boils down to locating the same coordinate in different correspondence maps. In CoordGAN, we sample a transformation to represent the structure of a synthesized instance, while an independent texture branch is responsible for rendering appearance details orthogonal to the structure. Our approach can also extract dense correspondence maps for real images by adding an encoder on top of the generator. We quantitatively demonstrate the quality of the learned dense correspondences through segmentation mask transfer on multiple datasets. We also show that the proposed generator achieves better structure and texture disentanglement compared to existing approaches. Project page: https://jitengmu.github.io/CoordGAN/

* Project page: https://jitengmu.github.io/CoordGAN/

Via

Access Paper or Ask Questions

FreeSOLO: Learning to Segment Objects without Annotations

Feb 24, 2022

Xinlong Wang, Zhiding Yu, Shalini De Mello, Jan Kautz, Anima Anandkumar, Chunhua Shen, Jose M. Alvarez

Figure 1 for FreeSOLO: Learning to Segment Objects without Annotations

Figure 2 for FreeSOLO: Learning to Segment Objects without Annotations

Figure 3 for FreeSOLO: Learning to Segment Objects without Annotations

Figure 4 for FreeSOLO: Learning to Segment Objects without Annotations

Abstract:Instance segmentation is a fundamental vision task that aims to recognize and segment each object in an image. However, it requires costly annotations such as bounding boxes and segmentation masks for learning. In this work, we propose a fully unsupervised learning method that learns class-agnostic instance segmentation without any annotations. We present FreeSOLO, a self-supervised instance segmentation framework built on top of the simple instance segmentation method SOLO. Our method also presents a novel localization-aware pre-training framework, where objects can be discovered from complicated scenes in an unsupervised manner. FreeSOLO achieves 9.8% AP_{50} on the challenging COCO dataset, which even outperforms several segmentation proposal methods that use manual annotations. For the first time, we demonstrate unsupervised class-agnostic instance segmentation successfully. FreeSOLO's box localization significantly outperforms state-of-the-art unsupervised object detection/discovery methods, with about 100% relative improvements in COCO AP. FreeSOLO further demonstrates superiority as a strong pre-training method, outperforming state-of-the-art self-supervised pre-training methods by +9.8% AP when fine-tuning instance segmentation with only 5% COCO masks.

* 13 pages

Via

Access Paper or Ask Questions

Benchmarking Robustness of 3D Point Cloud Recognition Against Common Corruptions

Jan 28, 2022

Jiachen Sun, Qingzhao Zhang, Bhavya Kailkhura, Zhiding Yu, Chaowei Xiao, Z. Morley Mao

Figure 1 for Benchmarking Robustness of 3D Point Cloud Recognition Against Common Corruptions

Figure 2 for Benchmarking Robustness of 3D Point Cloud Recognition Against Common Corruptions

Figure 3 for Benchmarking Robustness of 3D Point Cloud Recognition Against Common Corruptions

Figure 4 for Benchmarking Robustness of 3D Point Cloud Recognition Against Common Corruptions

Abstract:Deep neural networks on 3D point cloud data have been widely used in the real world, especially in safety-critical applications. However, their robustness against corruptions is less studied. In this paper, we present ModelNet40-C, the first comprehensive benchmark on 3D point cloud corruption robustness, consisting of 15 common and realistic corruptions. Our evaluation shows a significant gap between the performances on ModelNet40 and ModelNet40-C for state-of-the-art (SOTA) models. To reduce the gap, we propose a simple but effective method by combining PointCutMix-R and TENT after evaluating a wide range of augmentation and test-time adaptation strategies. We identify a number of critical insights for future studies on corruption robustness in point cloud recognition. For instance, we unveil that Transformer-based architectures with proper training recipes achieve the strongest robustness. We hope our in-depth analysis will motivate the development of robust training strategies or architecture designs in the 3D point cloud domain. Our codebase and dataset are included in https://github.com/jiachens/ModelNet40-C

* Codebase and dataset are included in https://github.com/jiachens/ModelNet40-C

Via

Access Paper or Ask Questions

AugMax: Adversarial Composition of Random Augmentations for Robust Training

Oct 26, 2021

Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Anima Anandkumar, Zhangyang Wang

Figure 1 for AugMax: Adversarial Composition of Random Augmentations for Robust Training

Figure 2 for AugMax: Adversarial Composition of Random Augmentations for Robust Training

Figure 3 for AugMax: Adversarial Composition of Random Augmentations for Robust Training

Figure 4 for AugMax: Adversarial Composition of Random Augmentations for Robust Training

Abstract:Data augmentation is a simple yet effective way to improve the robustness of deep neural networks (DNNs). Diversity and hardness are two complementary dimensions of data augmentation to achieve robustness. For example, AugMix explores random compositions of a diverse set of augmentations to enhance broader coverage, while adversarial training generates adversarially hard samples to spot the weakness. Motivated by this, we propose a data augmentation framework, termed AugMax, to unify the two aspects of diversity and hardness. AugMax first randomly samples multiple augmentation operators and then learns an adversarial mixture of the selected operators. Being a stronger form of data augmentation, AugMax leads to a significantly augmented input distribution which makes model training more challenging. To solve this problem, we further design a disentangled normalization module, termed DuBIN (Dual-Batch-and-Instance Normalization), that disentangles the instance-wise feature heterogeneity arising from AugMax. Experiments show that AugMax-DuBIN leads to significantly improved out-of-distribution robustness, outperforming prior arts by 3.03%, 3.49%, 1.82% and 0.71% on CIFAR10-C, CIFAR100-C, Tiny ImageNet-C and ImageNet-C. Codes and pretrained models are available: https://github.com/VITA-Group/AugMax.

* NeurIPS, 2021

Via

Access Paper or Ask Questions

Learning Contrastive Representation for Semantic Correspondence

Sep 22, 2021

Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz, Ming-Hsuan Yang

Figure 1 for Learning Contrastive Representation for Semantic Correspondence

Figure 2 for Learning Contrastive Representation for Semantic Correspondence

Figure 3 for Learning Contrastive Representation for Semantic Correspondence

Figure 4 for Learning Contrastive Representation for Semantic Correspondence

Abstract:Dense correspondence across semantically related images has been extensively studied, but still faces two challenges: 1) large variations in appearance, scale and pose exist even for objects from the same category, and 2) labeling pixel-level dense correspondences is labor intensive and infeasible to scale. Most existing approaches focus on designing various matching approaches with fully-supervised ImageNet pretrained networks. On the other hand, while a variety of self-supervised approaches are proposed to explicitly measure image-level similarities, correspondence matching the pixel level remains under-explored. In this work, we propose a multi-level contrastive learning approach for semantic matching, which does not rely on any ImageNet pretrained model. We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects, while the performance can be further enhanced by regularizing cross-instance cycle-consistency at intermediate feature levels. Experimental results on the PF-PASCAL, PF-WILLOW, and SPair-71k benchmark datasets demonstrate that our method performs favorably against the state-of-the-art approaches. The source code and trained models will be made available to the public.

Via

Access Paper or Ask Questions

Panoptic SegFormer

Sep 11, 2021

Zhiqi Li, Wenhai Wang, Enze Xie, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Tong Lu, Ping Luo

Abstract:We present Panoptic SegFormer, a general framework for end-to-end panoptic segmentation with Transformers. The proposed method extends Deformable DETR with a unified mask prediction workflow for both things and stuff, making the panoptic segmentation pipeline concise and effective. With a ResNet-50 backbone, our method achieves 50.0\% PQ on the COCO test-dev split, surpassing previous state-of-the-art methods by significant margins without bells and whistles. Using a more powerful PVTv2-B5 backbone, Panoptic-SegFormer achieves a new record of 54.1\%PQ and 54.4\% PQ on the COCO val and test-dev splits with single scale input.

* Technical Report

Via

Access Paper or Ask Questions

Towards Reducing Labeling Cost in Deep Object Detection

Jun 22, 2021

Ismail Elezi, Zhiding Yu, Anima Anandkumar, Laura Leal-Taixe, Jose M. Alvarez

Figure 1 for Towards Reducing Labeling Cost in Deep Object Detection

Figure 2 for Towards Reducing Labeling Cost in Deep Object Detection

Figure 3 for Towards Reducing Labeling Cost in Deep Object Detection

Figure 4 for Towards Reducing Labeling Cost in Deep Object Detection

Abstract:Deep neural networks have reached very high accuracy on object detection but their success hinges on large amounts of labeled data. To reduce the dependency on labels, various active-learning strategies have been proposed, typically based on the confidence of the detector. However, these methods are biased towards best-performing classes and can lead to acquired datasets that are not good representatives of the data in the testing set. In this work, we propose a unified framework for active learning, that considers both the uncertainty and the robustness of the detector, ensuring that the network performs accurately in all classes. Furthermore, our method is able to pseudo-label the very confident predictions, suppressing a potential distribution drift while further boosting the performance of the model. Experiments show that our method comprehensively outperforms a wide range of active-learning methods on PASCAL VOC07+12 and MS-COCO, having up to a 7.7% relative improvement, or up to 82% reduction in labeling cost.

* Includes supplementary material

Via

Access Paper or Ask Questions

SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

Jun 17, 2021

Linxi Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, Anima Anandkumar

Figure 1 for SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

Figure 2 for SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

Figure 3 for SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

Figure 4 for SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

Abstract:Generalization has been a long-standing challenge for reinforcement learning (RL). Visual RL, in particular, can be easily distracted by irrelevant factors in high-dimensional observation space. In this work, we consider robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift. We propose SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization. Specifically, an expert policy is first trained by RL from scratch with weak augmentations. A student network then learns to mimic the expert policy by supervised learning with strong augmentations, making its representation more robust against visual variations compared to the expert. Extensive experiments demonstrate that SECANT significantly advances the state of the art in zero-shot generalization across 4 challenging domains. Our average reward improvements over prior SOTAs are: DeepMind Control (+26.5%), robotic manipulation (+337.8%), vision-based autonomous driving (+47.7%), and indoor object navigation (+15.8%). Code release and video are available at https://linxifan.github.io/secant-site/.

* ICML 2021. Website: https://linxifan.github.io/secant-site/

Via

Access Paper or Ask Questions

Practical Machine Learning Safety: A Survey and Primer

Jun 09, 2021

Sina Mohseni, Haotao Wang, Zhiding Yu, Chaowei Xiao, Zhangyang Wang, Jay Yadawa

Figure 1 for Practical Machine Learning Safety: A Survey and Primer

Figure 2 for Practical Machine Learning Safety: A Survey and Primer

Figure 3 for Practical Machine Learning Safety: A Survey and Primer

Abstract:The open-world deployment of Machine Learning (ML) algorithms in safety-critical applications such as autonomous vehicles needs to address a variety of ML vulnerabilities such as interpretability, verifiability, and performance limitations. Research explores different approaches to improve ML dependability by proposing new models and training techniques to reduce generalization error, achieve domain adaptation, and detect outlier examples and adversarial attacks. In this paper, we review and organize practical ML techniques that can improve the safety and dependability of ML algorithms and therefore ML-based software. Our organization maps state-of-the-art ML techniques to safety strategies in order to enhance the dependability of the ML algorithm from different aspects, and discuss research gaps as well as promising solutions.

Via

Access Paper or Ask Questions

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

Jun 05, 2021

Shiyi Lan, Zhiding Yu, Christopher Choy, Subhashree Radhakrishnan, Guilin Liu, Yuke Zhu, Larry S. Davis, Anima Anandkumar

Figure 1 for DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

Figure 2 for DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

Figure 3 for DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

Figure 4 for DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

Abstract:We introduce DiscoBox, a novel framework that jointly learns instance segmentation and semantic correspondence using bounding box supervision. Specifically, we propose a self-ensembling framework where instance segmentation and semantic correspondence are jointly guided by a structured teacher in addition to the bounding box supervision. The teacher is a structured energy model incorporating a pairwise potential and a cross-image potential to model the pairwise pixel relationships both within and across the boxes. Minimizing the teacher energy simultaneously yields refined object masks and dense correspondences between intra-class objects, which are taken as pseudo-labels to supervise the task network and provide positive/negative correspondence pairs for dense constrastive learning. We show a symbiotic relationship where the two tasks mutually benefit from each other. Our best model achieves 37.9% AP on COCO instance segmentation, surpassing prior weakly supervised methods and is competitive to supervised methods. We also obtain state of the art weakly supervised results on PASCAL VOC12 and PF-PASCAL with real-time inference.

* Tech Report

Via

Access Paper or Ask Questions