Alert button
Picture for Hanjun Dai

Hanjun Dai

Alert button

Energy-based View of Retrosynthesis

Jul 14, 2020
Ruoxi Sun, Hanjun Dai, Li Li, Steven Kearnes, Bo Dai

Figure 1 for Energy-based View of Retrosynthesis
Figure 2 for Energy-based View of Retrosynthesis
Figure 3 for Energy-based View of Retrosynthesis
Figure 4 for Energy-based View of Retrosynthesis

Retrosynthesis -- the process of identifying a set of reactants to synthesize a target molecule -- is of vital importance to material design and drug discovery. Existing machine learning approaches based on language models and graph neural networks have achieved encouraging results. In this paper, we propose a framework that unifies sequence- and graph-based methods as energy-based models (EBMs) with different energy functions. This unified perspective provides critical insights about EBM variants through a comprehensive assessment of performance. Additionally, we present a novel dual variant within the framework that performs consistent training over Bayesian forward- and backward-prediction by constraining the agreement between the two directions. This model improves state-of-the-art performance by 9.6% for template-free approaches where the reaction type is unknown.

Viaarxiv icon

Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search

Jun 29, 2020
Binghong Chen, Chengtao Li, Hanjun Dai, Le Song

Figure 1 for Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search
Figure 2 for Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search
Figure 3 for Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search
Figure 4 for Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search

Retrosynthetic planning is a critical task in organic chemistry which identifies a series of reactions that can lead to the synthesis of a target product. The vast number of possible chemical transformations makes the size of the search space very big, and retrosynthetic planning is challenging even for experienced chemists. However, existing methods either require expensive return estimation by rollout with high variance, or optimize for search speed rather than the quality. In this paper, we propose Retro*, a neural-based A*-like algorithm that finds high-quality synthetic routes efficiently. It maintains the search as an AND-OR tree, and learns a neural search bias with off-policy data. Then guided by this neural network, it performs best-first search efficiently during new planning episodes. Experiments on benchmark USPTO datasets show that, our proposed method outperforms existing state-of-the-art with respect to both the success rate and solution quality, while being more efficient at the same time.

* Presented at ICML 2020 
Viaarxiv icon

Scalable Deep Generative Modeling for Sparse Graphs

Jun 28, 2020
Hanjun Dai, Azade Nazi, Yujia Li, Bo Dai, Dale Schuurmans

Figure 1 for Scalable Deep Generative Modeling for Sparse Graphs
Figure 2 for Scalable Deep Generative Modeling for Sparse Graphs
Figure 3 for Scalable Deep Generative Modeling for Sparse Graphs
Figure 4 for Scalable Deep Generative Modeling for Sparse Graphs

Learning graph generative models is a challenging task for deep learning and has wide applicability to a range of domains like chemistry, biology and social science. However current deep neural methods suffer from limited scalability: for a graph with $n$ nodes and $m$ edges, existing deep neural methods require $\Omega(n^2)$ complexity by building up the adjacency matrix. On the other hand, many real world graphs are actually sparse in the sense that $m\ll n^2$. Based on this, we develop a novel autoregressive model, named BiGG, that utilizes this sparsity to avoid generating the full adjacency matrix, and importantly reduces the graph generation time complexity to $O((n + m)\log n)$. Furthermore, during training this autoregressive model can be parallelized with $O(\log n)$ synchronization stages, which makes it much more efficient than other autoregressive models that require $\Omega(n)$. Experiments on several benchmarks show that the proposed approach not only scales to orders of magnitude larger graphs than previously possible with deep autoregressive graph generative models, but also yields better graph generation quality.

* ICML 2020 
Viaarxiv icon

Learning to Stop While Learning to Predict

Jun 09, 2020
Xinshi Chen, Hanjun Dai, Yu Li, Xin Gao, Le Song

Figure 1 for Learning to Stop While Learning to Predict
Figure 2 for Learning to Stop While Learning to Predict
Figure 3 for Learning to Stop While Learning to Predict
Figure 4 for Learning to Stop While Learning to Predict

There is a recent surge of interest in designing deep architectures based on the update steps in traditional algorithms, or learning neural networks to improve and replace traditional algorithms. While traditional algorithms have certain stopping criteria for outputting results at different iterations, many algorithm-inspired deep models are restricted to a ``fixed-depth'' for all inputs. Similar to algorithms, the optimal depth of a deep architecture may be different for different input instances, either to avoid ``over-thinking'', or because we want to compute less for operations converged already. In this paper, we tackle this varying depth problem using a steerable architecture, where a feed-forward deep model and a variational stopping policy are learned together to sequentially determine the optimal number of layers for each input instance. Training such architecture is very challenging. We provide a variational Bayes perspective and design a novel and effective training procedure which decomposes the task into an oracle model learning stage and an imitation stage. Experimentally, we show that the learned deep model along with the stopping policy improves the performances on a diverse set of tasks, including learning sparse recovery, few-shot meta learning, and computer vision tasks.

* Proceedings of the 37th International Conference on Machine Learning 
Viaarxiv icon

Energy-Based Processes for Exchangeable Data

Mar 17, 2020
Mengjiao Yang, Bo Dai, Hanjun Dai, Dale Schuurmans

Figure 1 for Energy-Based Processes for Exchangeable Data
Figure 2 for Energy-Based Processes for Exchangeable Data
Figure 3 for Energy-Based Processes for Exchangeable Data
Figure 4 for Energy-Based Processes for Exchangeable Data

Recently there has been growing interest in modeling sets with exchangeability such as point clouds. A shortcoming of current approaches is that they restrict the cardinality of the sets considered or can only express limited forms of distribution over unobserved data. To overcome these limitations, we introduce Energy-Based Processes (EBPs), which extend energy based models to exchangeable data while allowing neural network parameterizations of the energy function. A key advantage of these models is the ability to express more flexible distributions over sets without restricting their cardinality. We develop an efficient training procedure for EBPs that demonstrates state-of-the-art performance on a variety of tasks such as point cloud generation, classification, denoising, and image completion.

Viaarxiv icon

Differentiable Top-k Operator with Optimal Transport

Feb 18, 2020
Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister

Figure 1 for Differentiable Top-k Operator with Optimal Transport
Figure 2 for Differentiable Top-k Operator with Optimal Transport
Figure 3 for Differentiable Top-k Operator with Optimal Transport
Figure 4 for Differentiable Top-k Operator with Optimal Transport

The top-k operation, i.e., finding the k largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining. However, if the top-k operation is implemented in an algorithmic way, e.g., using bubble algorithm, the resulting model cannot be trained in an end-to-end way using prevalent gradient descent algorithms. This is because these implementations typically involve swapping indices, whose gradient cannot be computed. Moreover, the corresponding mapping from the input scores to the indicator vector of whether this element belongs to the top-k set is essentially discontinuous. To address the issue, we propose a smoothed approximation, namely the SOFT (Scalable Optimal transport-based diFferenTiable) top-k operator. Specifically, our SOFT top-k operator approximates the output of the top-k operation as the solution of an Entropic Optimal Transport (EOT) problem. The gradient of the SOFT operator can then be efficiently approximated based on the optimality conditions of EOT problem. We apply the proposed operator to the k-nearest neighbors and beam search algorithms, and demonstrate improved performance.

Viaarxiv icon

Retrosynthesis Prediction with Conditional Graph Logic Network

Jan 06, 2020
Hanjun Dai, Chengtao Li, Connor W. Coley, Bo Dai, Le Song

Figure 1 for Retrosynthesis Prediction with Conditional Graph Logic Network
Figure 2 for Retrosynthesis Prediction with Conditional Graph Logic Network
Figure 3 for Retrosynthesis Prediction with Conditional Graph Logic Network
Figure 4 for Retrosynthesis Prediction with Conditional Graph Logic Network

Retrosynthesis is one of the fundamental problems in organic chemistry. The task is to identify reactants that can be used to synthesize a specified product molecule. Recently, computer-aided retrosynthesis is finding renewed interest from both chemistry and computer science communities. Most existing approaches rely on template-based models that define subgraph matching rules, but whether or not a chemical reaction can proceed is not defined by hard decision rules. In this work, we propose a new approach to this task using the Conditional Graph Logic Network, a conditional graphical model built upon graph neural networks that learns when rules from reaction templates should be applied, implicitly considering whether the resulting reaction would be both chemically feasible and strategic. We also propose an efficient hierarchical sampling to alleviate the computation cost. While achieving a significant improvement of $8.1\%$ over current state-of-the-art methods on the benchmark dataset, our model also offers interpretations for the prediction.

* NeurIPS 2019 
Viaarxiv icon

Learning Transferable Graph Exploration

Oct 28, 2019
Hanjun Dai, Yujia Li, Chenglong Wang, Rishabh Singh, Po-Sen Huang, Pushmeet Kohli

Figure 1 for Learning Transferable Graph Exploration
Figure 2 for Learning Transferable Graph Exploration
Figure 3 for Learning Transferable Graph Exploration
Figure 4 for Learning Transferable Graph Exploration

This paper considers the problem of efficient exploration of unseen environments, a key challenge in AI. We propose a `learning to explore' framework where we learn a policy from a distribution of environments. At test time, presented with an unseen environment from the same distribution, the policy aims to generalize the exploration strategy to visit the maximum number of unique states in a limited number of steps. We particularly focus on environments with graph-structured state-spaces that are encountered in many important real-world applications like software testing and map building. We formulate this task as a reinforcement learning problem where the `exploration' agent is rewarded for transitioning to previously unseen environment states and employ a graph-structured memory to encode the agent's past trajectory. Experimental results demonstrate that our approach is extremely effective for exploration of spatial maps; and when applied on the challenging problems of coverage-guided software-testing of domain-specific programs and real-world mobile applications, it outperforms methods that have been hand-engineered by human experts.

* To appear in NeurIPS 2019 
Viaarxiv icon

Cooperative neural networks (CoNN): Exploiting prior independence structure for improved classification

Jun 01, 2019
Harsh Shrivastava, Eugene Bart, Bob Price, Hanjun Dai, Bo Dai, Srinivas Aluru

Figure 1 for Cooperative neural networks (CoNN): Exploiting prior independence structure for improved classification
Figure 2 for Cooperative neural networks (CoNN): Exploiting prior independence structure for improved classification
Figure 3 for Cooperative neural networks (CoNN): Exploiting prior independence structure for improved classification
Figure 4 for Cooperative neural networks (CoNN): Exploiting prior independence structure for improved classification

We propose a new approach, called cooperative neural networks (CoNN), which uses a set of cooperatively trained neural networks to capture latent representations that exploit prior given independence structure. The model is more flexible than traditional graphical models based on exponential family distributions, but incorporates more domain specific prior structure than traditional deep networks or variational autoencoders. The framework is very general and can be used to exploit the independence structure of any graphical model. We illustrate the technique by showing that we can transfer the independence structure of the popular Latent Dirichlet Allocation (LDA) model to a cooperative neural network, CoNN-sLDA. Empirical evaluation of CoNN-sLDA on supervised text classification tasks demonstrates that the theoretical advantages of prior independence structure can be realized in practice -we demonstrate a 23\% reduction in error on the challenging MultiSent data set compared to state-of-the-art.

Viaarxiv icon

Exponential Family Estimation via Adversarial Dynamics Embedding

Apr 27, 2019
Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans

Figure 1 for Exponential Family Estimation via Adversarial Dynamics Embedding
Figure 2 for Exponential Family Estimation via Adversarial Dynamics Embedding
Figure 3 for Exponential Family Estimation via Adversarial Dynamics Embedding
Figure 4 for Exponential Family Estimation via Adversarial Dynamics Embedding

We present an efficient algorithm for maximum likelihood estimation (MLE) of the general exponential family, even in cases when the energy function is represented by a deep neural network. We consider the primal-dual view of the MLE for the kinectics augmented model, which naturally introduces an adversarial dual sampler. The sampler will be represented by a novel neural network architectures, dynamics embeddings, mimicking the dynamical-based samplers, e.g., Hamiltonian Monte-Carlo and its variants. The dynamics embedding parametrization inherits the flexibility from HMC, and provides tractable entropy estimation of the augmented model. Meanwhile, it couples the adversarial dual samplers with the primal model, reducing memory and sample complexity. We further show that several existing estimators, including contrastive divergence (Hinton, 2002), score matching (Hyv\"arinen, 2005), pseudo-likelihood (Besag, 1975), noise-contrastive estimation (Gutmann and Hyv\"arinen, 2010), non-local contrastive objectives (Vickrey et al., 2010), and minimum probability flow (Sohl-Dickstein et al., 2011), can be recast as the special cases of the proposed method with different prefixed dual samplers. Finally, we empirically demonstrate the superiority of the proposed estimator against existing state-of-the-art methods on synthetic and real-world benchmarks.

* 66 figures, 25 pages; preliminary version published in NeurIPS2018 Bayesian Deep Learning Workshop 
Viaarxiv icon