Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haoran Sun

Score-based Continuous-time Discrete Diffusion Models

Nov 30, 2022
Haoran Sun, Lijun Yu, Bo Dai, Dale Schuurmans, Hanjun Dai

Figure 1 for Score-based Continuous-time Discrete Diffusion Models

Figure 2 for Score-based Continuous-time Discrete Diffusion Models

Figure 3 for Score-based Continuous-time Discrete Diffusion Models

Figure 4 for Score-based Continuous-time Discrete Diffusion Models

Score-based modeling through stochastic differential equations (SDEs) has provided a new perspective on diffusion models, and demonstrated superior performance on continuous data. However, the gradient of the log-likelihood function, i.e., the score function, is not properly defined for discrete spaces. This makes it non-trivial to adapt \textcolor{\cdiff}{the score-based modeling} to categorical data. In this paper, we extend diffusion models to discrete variables by introducing a stochastic jump process where the reverse process denoises via a continuous-time Markov chain. This formulation admits an analytical simulation during backward sampling. To learn the reverse process, we extend score matching to general categorical data and show that an unbiased estimator can be obtained via simple matching of the conditional marginal distributions. We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.

Via

Access Paper or Ask Questions

Optimal Scaling for Locally Balanced Proposals in Discrete Spaces

Sep 16, 2022
Haoran Sun, Hanjun Dai, Dale Schuurmans

Figure 1 for Optimal Scaling for Locally Balanced Proposals in Discrete Spaces

Figure 2 for Optimal Scaling for Locally Balanced Proposals in Discrete Spaces

Figure 3 for Optimal Scaling for Locally Balanced Proposals in Discrete Spaces

Figure 4 for Optimal Scaling for Locally Balanced Proposals in Discrete Spaces

Optimal scaling has been well studied for Metropolis-Hastings (M-H) algorithms in continuous spaces, but a similar understanding has been lacking in discrete spaces. Recently, a family of locally balanced proposals (LBP) for discrete spaces has been proved to be asymptotically optimal, but the question of optimal scaling has remained open. In this paper, we establish, for the first time, that the efficiency of M-H in discrete spaces can also be characterized by an asymptotic acceptance rate that is independent of the target distribution. Moreover, we verify, both theoretically and empirically, that the optimal acceptance rates for LBP and random walk Metropolis (RWM) are $0.574$ and $0.234$ respectively. These results also help establish that LBP is asymptotically $O(N^\frac{2}{3})$ more efficient than RWM with respect to model dimension $N$. Knowledge of the optimal acceptance rate allows one to automatically tune the neighborhood size of a proposal distribution in a discrete space, directly analogous to step-size control in continuous spaces. We demonstrate empirically that such adaptive M-H sampling can robustly improve sampling in a variety of target distributions in discrete spaces, including training deep energy based models.

Via

Access Paper or Ask Questions

Annealed Training for Combinatorial Optimization on Graphs

Jul 23, 2022
Haoran Sun, Etash K. Guha, Hanjun Dai

Figure 1 for Annealed Training for Combinatorial Optimization on Graphs

Figure 2 for Annealed Training for Combinatorial Optimization on Graphs

Figure 3 for Annealed Training for Combinatorial Optimization on Graphs

Figure 4 for Annealed Training for Combinatorial Optimization on Graphs

The hardness of combinatorial optimization (CO) problems hinders collecting solutions for supervised learning. However, learning neural networks for CO problems is notoriously difficult in lack of the labeled data as the training is easily trapped at local optima. In this work, we propose a simple but effective annealed training framework for CO problems. In particular, we transform CO problems into unbiased energy-based models (EBMs). We carefully selected the penalties terms so as to make the EBMs as smooth as possible. Then we train graph neural networks to approximate the EBMs. To prevent the training from being stuck at local optima near the initialization, we introduce an annealed loss function. An experimental evaluation demonstrates that our annealed training framework obtains substantial improvements. In four types of CO problems, our method achieves performance substantially better than other unsupervised neural methods on both synthetic and real-world graphs.

Via

Access Paper or Ask Questions

Discrete Langevin Sampler via Wasserstein Gradient Flow

Jun 29, 2022
Haoran Sun, Hanjun Dai, Bo Dai, Haomin Zhou, Dale Schuurmans

Figure 1 for Discrete Langevin Sampler via Wasserstein Gradient Flow

Figure 2 for Discrete Langevin Sampler via Wasserstein Gradient Flow

Figure 3 for Discrete Langevin Sampler via Wasserstein Gradient Flow

Figure 4 for Discrete Langevin Sampler via Wasserstein Gradient Flow

Recently, a family of locally balanced (LB) samplers has demonstrated excellent performance at sampling and learning energy-based models (EBMs) in discrete spaces. However, the theoretical understanding of this success is limited. In this work, we show how LB functions give rise to LB dynamics corresponding to Wasserstein gradient flow in a discrete space. From first principles, previous LB samplers can then be seen as discretizations of the LB dynamics with respect to Hamming distance. Based on this observation, we propose a new algorithm, the Locally Balanced Jump (LBJ), by discretizing the LB dynamics with respect to simulation time. As a result, LBJ has a location-dependent "velocity" that allows it to make proposals with larger distances. Additionally, LBJ decouples each dimension into independent sub-processes, enabling convenient parallel implementation. We demonstrate the advantages of LBJ for sampling and learning in various binary and categorical distributions.

Via

Access Paper or Ask Questions

To Supervise or Not: How to Effectively Learn Wireless Interference Management Models?

Dec 28, 2021
Bingqing Song, Haoran Sun, Wenqiang Pu, Sijia Liu, Mingyi Hong

Figure 1 for To Supervise or Not: How to Effectively Learn Wireless Interference Management Models?

Figure 2 for To Supervise or Not: How to Effectively Learn Wireless Interference Management Models?

Figure 3 for To Supervise or Not: How to Effectively Learn Wireless Interference Management Models?

Figure 4 for To Supervise or Not: How to Effectively Learn Wireless Interference Management Models?

Machine learning has become successful in solving wireless interference management problems. Different kinds of deep neural networks (DNNs) have been trained to accomplish key tasks such as power control, beamforming and admission control. There are two popular training paradigms for such DNNs-based interference management models: supervised learning (i.e., fitting labels generated by an optimization algorithm) and unsupervised learning (i.e., directly optimizing some system performance measure). Although both of these paradigms have been extensively applied in practice, due to the lack of any theoretical understanding about these methods, it is not clear how to systematically understand and compare their performance. In this work, we conduct theoretical studies to provide some in-depth understanding about these two training paradigms. First, we show a somewhat surprising result, that for some special power control problem, the unsupervised learning can perform much worse than its supervised counterpart, because it is more likely to stuck at some low-quality local solutions. We then provide a series of theoretical results to further understand the properties of the two approaches. Generally speaking, we show that when high-quality labels are available, then the supervised learning is less likely to be stuck at a solution than its unsupervised counterpart. Additionally, we develop a semi-supervised learning approach which properly integrates these two training paradigms, and can effectively utilize limited number of labels to find high-quality solutions. To our knowledge, these are the first set of theoretical results trying to understand different training approaches in learning-based wireless communication system design.

Via

Access Paper or Ask Questions

How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition

Nov 24, 2021
Haoran Sun, Lantian Li, Thomas Fang Zheng, Dong Wang

Figure 1 for How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition

Figure 2 for How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition

Figure 3 for How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition

Figure 4 for How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition

The way that humans encode their emotion into speech signals is complex. For instance, an angry man may increase his pitch and speaking rate, and use impolite words. In this paper, we present a preliminary study on various emotional factors and investigate how each of them impacts modern emotion recognition systems. The key tool of our study is the SpeechFlow model presented recently, by which we are able to decompose speech signals into separate information factors (content, pitch, rhythm). Based on this decomposition, we carefully studied the performance of each information component and their combinations. We conducted the study on three different speech emotion corpora and chose an attention-based convolutional RNN as the emotion classifier. Our results show that rhythm is the most important component for emotional expression. Moreover, the cross-corpus results are very bad (even worse than guess), demonstrating that the present speech emotion recognition model is rather weak. Interestingly, by removing one or several unimportant components, the cross-corpus results can be improved. This demonstrates the potential of the decomposition approach towards a generalizable emotion recognition.

Via

Access Paper or Ask Questions

Multi-task Learning of Order-Consistent Causal Graphs

Nov 03, 2021
Xinshi Chen, Haoran Sun, Caleb Ellington, Eric Xing, Le Song

Figure 1 for Multi-task Learning of Order-Consistent Causal Graphs

Figure 2 for Multi-task Learning of Order-Consistent Causal Graphs

Figure 3 for Multi-task Learning of Order-Consistent Causal Graphs

Figure 4 for Multi-task Learning of Order-Consistent Causal Graphs

We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.

* 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

Via

Access Paper or Ask Questions

CycleFlow: Purify Information Factors by Cycle Loss

Oct 20, 2021
Haoran Sun, Chen Chen, Lantian Li, Dong Wang

Figure 1 for CycleFlow: Purify Information Factors by Cycle Loss

Figure 2 for CycleFlow: Purify Information Factors by Cycle Loss

Figure 3 for CycleFlow: Purify Information Factors by Cycle Loss

Figure 4 for CycleFlow: Purify Information Factors by Cycle Loss

SpeechFlow is a powerful factorization model based on information bottleneck (IB), and its effectiveness has been reported by several studies. A potential problem of SpeechFlow, however, is that if the IB channels are not well designed, the resultant factors cannot be well disentangled. In this study, we propose a CycleFlow model that combines random factor substitution and cycle loss to solve this problem. Experiments on voice conversion tasks demonstrate that this simple technique can effectively reduce mutual information among individual factors, and produce clearly better conversion than the IB-based SpeechFlow. CycleFlow can also be used as a powerful tool for speech editing. We demonstrate this usage by an emotion perception experiment.

* Submitted to ICASSP 2022

Via

Access Paper or Ask Questions

Fermion Sampling Made More Efficient

Sep 15, 2021
Haoran Sun, Jie Zou, Xiaopeng Li

Figure 1 for Fermion Sampling Made More Efficient

Figure 2 for Fermion Sampling Made More Efficient

Figure 3 for Fermion Sampling Made More Efficient

Fermion sampling is to generate probability distribution of a many-body Slater-determinant wavefunction, which is termed "determinantal point process" in statistical analysis. For its inherently-embedded Pauli exclusion principle, its application reaches beyond simulating fermionic quantum many-body physics to constructing machine learning models for diversified datasets. Here we propose a fermion sampling algorithm, which has a polynomial time-complexity -- quadratic in the fermion number and linear in the system size. This algorithm is about 100% more efficient in computation time than the best known algorithms. In sampling the corresponding marginal distribution, our algorithm has a more drastic improvement, achieving a scaling advantage. We demonstrate its power on several test applications, including sampling fermions in a many-body system and a machine learning task of text summarization, and confirm its improved computation efficiency over other methods by counting floating-point operations.

Via

Access Paper or Ask Questions