Alert button
Picture for Haoran Tang

Haoran Tang

Alert button

HashEncoding: Autoencoding with Multiscale Coordinate Hashing

Nov 29, 2022
Lukas Zhornyak, Zhengjie Xu, Haoran Tang, Jianbo Shi

Figure 1 for HashEncoding: Autoencoding with Multiscale Coordinate Hashing
Figure 2 for HashEncoding: Autoencoding with Multiscale Coordinate Hashing
Figure 3 for HashEncoding: Autoencoding with Multiscale Coordinate Hashing
Figure 4 for HashEncoding: Autoencoding with Multiscale Coordinate Hashing

We present HashEncoding, a novel autoencoding architecture that leverages a non-parametric multiscale coordinate hash function to facilitate a per-pixel decoder without convolutions. By leveraging the space-folding behaviour of hashing functions, HashEncoding allows for an inherently multiscale embedding space that remains much smaller than the original image. As a result, the decoder requires very few parameters compared with decoders in traditional autoencoders, approaching a non-parametric reconstruction of the original image and allowing for greater generalizability. Finally, by allowing backpropagation directly to the coordinate space, we show that HashEncoding can be exploited for geometric tasks such as optical flow.

Viaarxiv icon

Is Self-Supervised Learning More Robust Than Supervised Learning?

Jun 10, 2022
Yuanyi Zhong, Haoran Tang, Junkun Chen, Jian Peng, Yu-Xiong Wang

Figure 1 for Is Self-Supervised Learning More Robust Than Supervised Learning?
Figure 2 for Is Self-Supervised Learning More Robust Than Supervised Learning?
Figure 3 for Is Self-Supervised Learning More Robust Than Supervised Learning?
Figure 4 for Is Self-Supervised Learning More Robust Than Supervised Learning?

Self-supervised contrastive learning is a powerful tool to learn visual representation without labels. Prior work has primarily focused on evaluating the recognition accuracy of various pre-training algorithms, but has overlooked other behavioral aspects. In addition to accuracy, distributional robustness plays a critical role in the reliability of machine learning models. We design and conduct a series of robustness tests to quantify the behavioral differences between contrastive learning and supervised learning to downstream or pre-training data distribution changes. These tests leverage data corruptions at multiple levels, ranging from pixel-level gamma distortion to patch-level shuffling and to dataset-level distribution shift. Our tests unveil intriguing robustness behaviors of contrastive and supervised learning. On the one hand, under downstream corruptions, we generally observe that contrastive learning is surprisingly more robust than supervised learning. On the other hand, under pre-training corruptions, we find contrastive learning vulnerable to patch shuffling and pixel intensity change, yet less sensitive to dataset-level distribution change. We attempt to explain these results through the role of data augmentation and feature space properties. Our insight has implications in improving the downstream robustness of supervised learning.

Viaarxiv icon

Shuffle Augmentation of Features from Unlabeled Data for Unsupervised Domain Adaptation

Jan 28, 2022
Changwei Xu, Jianfei Yang, Haoran Tang, Han Zou, Cheng Lu, Tianshuo Zhang

Figure 1 for Shuffle Augmentation of Features from Unlabeled Data for Unsupervised Domain Adaptation
Figure 2 for Shuffle Augmentation of Features from Unlabeled Data for Unsupervised Domain Adaptation
Figure 3 for Shuffle Augmentation of Features from Unlabeled Data for Unsupervised Domain Adaptation
Figure 4 for Shuffle Augmentation of Features from Unlabeled Data for Unsupervised Domain Adaptation

Unsupervised Domain Adaptation (UDA), a branch of transfer learning where labels for target samples are unavailable, has been widely researched and developed in recent years with the help of adversarially trained models. Although existing UDA algorithms are able to guide neural networks to extract transferable and discriminative features, classifiers are merely trained under the supervision of labeled source data. Given the inevitable discrepancy between source and target domains, the classifiers can hardly be aware of the target classification boundaries. In this paper, Shuffle Augmentation of Features (SAF), a novel UDA framework, is proposed to address the problem by providing the classifier with supervisory signals from target feature representations. SAF learns from the target samples, adaptively distills class-aware target features, and implicitly guides the classifier to find comprehensive class borders. Demonstrated by extensive experiments, the SAF module can be integrated into any existing adversarial UDA models to achieve performance improvements.

* 17 pages, 5 figures 
Viaarxiv icon

Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?

Sep 23, 2019
Ofir Nachum, Haoran Tang, Xingyu Lu, Shixiang Gu, Honglak Lee, Sergey Levine

Figure 1 for Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?
Figure 2 for Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?
Figure 3 for Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?
Figure 4 for Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?

Hierarchical reinforcement learning has demonstrated significant success at solving difficult reinforcement learning (RL) tasks. Previous works have motivated the use of hierarchy by appealing to a number of intuitive benefits, including learning over temporally extended transitions, exploring over temporally extended periods, and training and exploring in a more semantically meaningful action space, among others. However, in fully observed, Markovian settings, it is not immediately clear why hierarchical RL should provide benefits over standard "shallow" RL architectures. In this work, we isolate and evaluate the claimed benefits of hierarchical RL on a suite of tasks encompassing locomotion, navigation, and manipulation. Surprisingly, we find that most of the observed benefits of hierarchy can be attributed to improved exploration, as opposed to easier policy learning or imposed hierarchical structures. Given this insight, we present exploration techniques inspired by hierarchy that achieve performance competitive with hierarchical RL while at the same time being much simpler to use and implement.

Viaarxiv icon

Modular Architecture for StarCraft II with Deep Reinforcement Learning

Nov 08, 2018
Dennis Lee, Haoran Tang, Jeffrey O Zhang, Huazhe Xu, Trevor Darrell, Pieter Abbeel

Figure 1 for Modular Architecture for StarCraft II with Deep Reinforcement Learning
Figure 2 for Modular Architecture for StarCraft II with Deep Reinforcement Learning
Figure 3 for Modular Architecture for StarCraft II with Deep Reinforcement Learning
Figure 4 for Modular Architecture for StarCraft II with Deep Reinforcement Learning

We present a novel modular architecture for StarCraft II AI. The architecture splits responsibilities between multiple modules that each control one aspect of the game, such as build-order selection or tactics. A centralized scheduler reviews macros suggested by all modules and decides their order of execution. An updater keeps track of environment changes and instantiates macros into series of executable actions. Modules in this framework can be optimized independently or jointly via human design, planning, or reinforcement learning. We apply deep reinforcement learning techniques to training two out of six modules of a modular agent with self-play, achieving 94% or 87% win rates against the "Harder" (level 5) built-in Blizzard bot in Zerg vs. Zerg matches, with or without fog-of-war.

* Accepted to The 14th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE'18) 
Viaarxiv icon

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

Dec 05, 2017
Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel

Figure 1 for #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
Figure 2 for #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
Figure 3 for #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
Figure 4 for #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

Count-based exploration algorithms are known to perform near-optimally when used in conjunction with tabular reinforcement learning (RL) methods for solving small discrete Markov decision processes (MDPs). It is generally thought that count-based methods cannot be applied in high-dimensional state spaces, since most states will only occur once. Recent deep RL exploration strategies are able to deal with high-dimensional continuous state spaces through complex heuristics, often relying on optimism in the face of uncertainty or intrinsic motivation. In this work, we describe a surprising finding: a simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks. States are mapped to hash codes, which allows to count their occurrences with a hash table. These counts are then used to compute a reward bonus according to the classic count-based exploration theory. We find that simple hash functions can achieve surprisingly good results on many challenging tasks. Furthermore, we show that a domain-dependent learned hash code may further improve these results. Detailed analysis reveals important aspects of a good hash function: 1) having appropriate granularity and 2) encoding information relevant to solving the MDP. This exploration strategy achieves near state-of-the-art performance on both continuous control tasks and Atari 2600 games, hence providing a simple yet powerful baseline for solving MDPs that require considerable exploration.

* 10 pages main text + 10 pages supplementary. Published at NIPS 2017 
Viaarxiv icon

Reinforcement Learning with Deep Energy-Based Policies

Jul 21, 2017
Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, Sergey Levine

Figure 1 for Reinforcement Learning with Deep Energy-Based Policies
Figure 2 for Reinforcement Learning with Deep Energy-Based Policies
Figure 3 for Reinforcement Learning with Deep Energy-Based Policies
Figure 4 for Reinforcement Learning with Deep Energy-Based Policies

We propose a method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before. We apply our method to learning maximum entropy policies, resulting into a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution. We use the recently proposed amortized Stein variational gradient descent to learn a stochastic sampling network that approximates samples from this distribution. The benefits of the proposed algorithm include improved exploration and compositionality that allows transferring skills between tasks, which we confirm in simulated experiments with swimming and walking robots. We also draw a connection to actor-critic methods, which can be viewed performing approximate inference on the corresponding energy-based model.

Viaarxiv icon