Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lawrence Carin

Symmetric Variational Autoencoder and Connections to Adversarial Learning

Oct 19, 2017
Liqun Chen, Shuyang Dai, Yunchen Pu, Chunyuan Li, Qinliang Su, Lawrence Carin

Figure 1 for Symmetric Variational Autoencoder and Connections to Adversarial Learning

Figure 2 for Symmetric Variational Autoencoder and Connections to Adversarial Learning

Figure 3 for Symmetric Variational Autoencoder and Connections to Adversarial Learning

Figure 4 for Symmetric Variational Autoencoder and Connections to Adversarial Learning

A new form of the variational autoencoder (VAE) is proposed, based on the symmetric Kullback-Leibler divergence. It is demonstrated that learning of the resulting symmetric VAE (sVAE) has close connections to previously developed adversarial-learning methods. This relationship helps unify the previously distinct techniques of VAE and adversarially learning, and provides insights that allow us to ameliorate shortcomings with some previously developed adversarial methods. In addition to an analysis that motivates and explains the sVAE, an extensive set of experiments validate the utility of the approach.

Via

Access Paper or Ask Questions

Deconvolutional Paragraph Representation Learning

Sep 22, 2017
Yizhe Zhang, Dinghan Shen, Guoyin Wang, Zhe Gan, Ricardo Henao, Lawrence Carin

Figure 1 for Deconvolutional Paragraph Representation Learning

Figure 2 for Deconvolutional Paragraph Representation Learning

Figure 3 for Deconvolutional Paragraph Representation Learning

Figure 4 for Deconvolutional Paragraph Representation Learning

Learning latent representations from long text sequences is an important first step in many natural language processing applications. Recurrent Neural Networks (RNNs) have become a cornerstone for this challenging task. However, the quality of sentences during RNN-based decoding (reconstruction) decreases with the length of the text. We propose a sequence-to-sequence, purely convolutional and deconvolutional autoencoding framework that is free of the above issue, while also being computationally efficient. The proposed method is simple, easy to implement and can be leveraged as a building block for many applications. We show empirically that compared to RNNs, our framework is better at reconstructing and correcting long paragraphs. Quantitative evaluation on semi-supervised text classification and summarization tasks demonstrate the potential for better utilization of long unlabeled text data.

* Accepted by NIPS 2017

Via

Access Paper or Ask Questions

A Probabilistic Framework for Nonlinearities in Stochastic Neural Networks

Sep 18, 2017
Qinliang Su, Xuejun Liao, Lawrence Carin

Figure 1 for A Probabilistic Framework for Nonlinearities in Stochastic Neural Networks

Figure 2 for A Probabilistic Framework for Nonlinearities in Stochastic Neural Networks

Figure 3 for A Probabilistic Framework for Nonlinearities in Stochastic Neural Networks

Figure 4 for A Probabilistic Framework for Nonlinearities in Stochastic Neural Networks

We present a probabilistic framework for nonlinearities, based on doubly truncated Gaussian distributions. By setting the truncation points appropriately, we are able to generate various types of nonlinearities within a unified framework, including sigmoid, tanh and ReLU, the most commonly used nonlinearities in neural networks. The framework readily integrates into existing stochastic neural networks (with hidden units characterized as random variables), allowing one for the first time to learn the nonlinearities alongside model weights in these networks. Extensive experiments demonstrate the performance improvements brought about by the proposed framework when integrated with the restricted Boltzmann machine (RBM), temporal RBM and the truncated Gaussian graphical model (TGGM).

Via

Access Paper or Ask Questions

A Convergence Analysis for A Class of Practical Variance-Reduction Stochastic Gradient MCMC

Sep 04, 2017
Changyou Chen, Wenlin Wang, Yizhe Zhang, Qinliang Su, Lawrence Carin

Figure 1 for A Convergence Analysis for A Class of Practical Variance-Reduction Stochastic Gradient MCMC

Figure 2 for A Convergence Analysis for A Class of Practical Variance-Reduction Stochastic Gradient MCMC

Figure 3 for A Convergence Analysis for A Class of Practical Variance-Reduction Stochastic Gradient MCMC

Figure 4 for A Convergence Analysis for A Class of Practical Variance-Reduction Stochastic Gradient MCMC

Stochastic gradient Markov Chain Monte Carlo (SG-MCMC) has been developed as a flexible family of scalable Bayesian sampling algorithms. However, there has been little theoretical analysis of the impact of minibatch size to the algorithm's convergence rate. In this paper, we prove that under a limited computational budget/time, a larger minibatch size leads to a faster decrease of the mean squared error bound (thus the fastest one corresponds to using full gradients), which motivates the necessity of variance reduction in SG-MCMC. Consequently, by borrowing ideas from stochastic optimization, we propose a practical variance-reduction technique for SG-MCMC, that is efficient in both computation and storage. We develop theory to prove that our algorithm induces a faster convergence rate than standard SG-MCMC. A number of large-scale experiments, ranging from Bayesian learning of logistic regression to deep neural networks, validate the theory and demonstrate the superiority of the proposed variance-reduction SG-MCMC framework.

Via

Access Paper or Ask Questions

Learning Generic Sentence Representations Using Convolutional Neural Networks

Jul 26, 2017
Zhe Gan, Yunchen Pu, Ricardo Henao, Chunyuan Li, Xiaodong He, Lawrence Carin

Figure 1 for Learning Generic Sentence Representations Using Convolutional Neural Networks

Figure 2 for Learning Generic Sentence Representations Using Convolutional Neural Networks

Figure 3 for Learning Generic Sentence Representations Using Convolutional Neural Networks

Figure 4 for Learning Generic Sentence Representations Using Convolutional Neural Networks

We propose a new encoder-decoder approach to learn distributed sentence representations that are applicable to multiple purposes. The model is learned by using a convolutional neural network as an encoder to map an input sentence into a continuous vector, and using a long short-term memory recurrent neural network as a decoder. Several tasks are considered, including sentence reconstruction and future sentence prediction. Further, a hierarchical encoder-decoder model is proposed to encode a sentence to predict multiple future sentences. By training our models on a large collection of novels, we obtain a highly generic convolutional sentence encoder that performs well in practice. Experimental results on several benchmark datasets, and across a broad range of applications, demonstrate the superiority of the proposed model over competing methods.

* Accepted by EMNLP 2017

Via

Access Paper or Ask Questions

Deep Generative Models for Relational Data with Side Information

Jun 16, 2017
Changwei Hu, Piyush Rai, Lawrence Carin

Figure 1 for Deep Generative Models for Relational Data with Side Information

Figure 2 for Deep Generative Models for Relational Data with Side Information

Figure 3 for Deep Generative Models for Relational Data with Side Information

Figure 4 for Deep Generative Models for Relational Data with Side Information

We present a probabilistic framework for overlapping community discovery and link prediction for relational data, given as a graph. The proposed framework has: (1) a deep architecture which enables us to infer multiple layers of latent features/communities for each node, providing superior link prediction performance on more complex networks and better interpretability of the latent features; and (2) a regression model which allows directly conditioning the node latent features on the side information available in form of node attributes. Our framework handles both (1) and (2) via a clean, unified model, which enjoys full local conjugacy via data augmentation, and facilitates efficient inference via closed form Gibbs sampling. Moreover, inference cost scales in the number of edges which is attractive for massive but sparse networks. Our framework is also easily extendable to model weighted networks with count-valued edges. We compare with various state-of-the-art methods and report results, both quantitative and qualitative, on several benchmark data sets.

Via

Access Paper or Ask Questions

Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling

Apr 24, 2017
Zhe Gan, Chunyuan Li, Changyou Chen, Yunchen Pu, Qinliang Su, Lawrence Carin

Figure 1 for Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling

Figure 2 for Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling

Figure 3 for Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling

Figure 4 for Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling

Recurrent neural networks (RNNs) have shown promising performance for language modeling. However, traditional training of RNNs using back-propagation through time often suffers from overfitting. One reason for this is that stochastic optimization (used for large training sets) does not provide good estimates of model uncertainty. This paper leverages recent advances in stochastic gradient Markov Chain Monte Carlo (also appropriate for large training sets) to learn weight uncertainty in RNNs. It yields a principled Bayesian learning algorithm, adding gradient noise during training (enhancing exploration of the model-parameter space) and model averaging when testing. Extensive experiments on various RNN models and across a broad range of applications demonstrate the superiority of the proposed approach over stochastic optimization.

* Accepted to ACL 2017

Via

Access Paper or Ask Questions

Semantic Compositional Networks for Visual Captioning

Mar 28, 2017
Zhe Gan, Chuang Gan, Xiaodong He, Yunchen Pu, Kenneth Tran, Jianfeng Gao, Lawrence Carin, Li Deng

Figure 1 for Semantic Compositional Networks for Visual Captioning

Figure 2 for Semantic Compositional Networks for Visual Captioning

Figure 3 for Semantic Compositional Networks for Visual Captioning

Figure 4 for Semantic Compositional Networks for Visual Captioning

A Semantic Compositional Network (SCN) is developed for image captioning, in which semantic concepts (i.e., tags) are detected from the image, and the probability of each tag is used to compose the parameters in a long short-term memory (LSTM) network. The SCN extends each weight matrix of the LSTM to an ensemble of tag-dependent weight matrices. The degree to which each member of the ensemble is used to generate an image caption is tied to the image-dependent probability of the corresponding tag. In addition to captioning images, we also extend the SCN to generate captions for video clips. We qualitatively analyze semantic composition in SCNs, and quantitatively evaluate the algorithm on three benchmark datasets: COCO, Flickr30k, and Youtube2Text. Experimental results show that the proposed method significantly outperforms prior state-of-the-art approaches, across multiple evaluation metrics.

* Accepted in CVPR 2017

Via

Access Paper or Ask Questions

Tensor-Dictionary Learning with Deep Kruskal-Factor Analysis

Mar 05, 2017
Andrew Stevens, Yunchen Pu, Yannan Sun, Greg Spell, Lawrence Carin

Figure 1 for Tensor-Dictionary Learning with Deep Kruskal-Factor Analysis

Figure 2 for Tensor-Dictionary Learning with Deep Kruskal-Factor Analysis

Figure 3 for Tensor-Dictionary Learning with Deep Kruskal-Factor Analysis

Figure 4 for Tensor-Dictionary Learning with Deep Kruskal-Factor Analysis

A multi-way factor analysis model is introduced for tensor-variate data of any order. Each data item is represented as a (sparse) sum of Kruskal decompositions, a Kruskal-factor analysis (KFA). KFA is nonparametric and can infer both the tensor-rank of each dictionary atom and the number of dictionary atoms. The model is adapted for online learning, which allows dictionary learning on large data sets. After KFA is introduced, the model is extended to a deep convolutional tensor-factor analysis, supervised by a Bayesian SVM. The experiments section demonstrates the improvement of KFA over vectorized approaches (e.g., BPFA), tensor decompositions, and convolutional neural networks (CNN) in multi-way denoising, blind inpainting, and image classification. The improvement in PSNR for the inpainting results over other methods exceeds 1dB in several cases and we achieve state of the art results on Caltech101 image classification.

* AISTATS 2017

Via

Access Paper or Ask Questions

Compressive Sensing via Convolutional Factor Analysis

Jan 11, 2017
Xin Yuan, Yunchen Pu, Lawrence Carin

Figure 1 for Compressive Sensing via Convolutional Factor Analysis

Figure 2 for Compressive Sensing via Convolutional Factor Analysis

Figure 3 for Compressive Sensing via Convolutional Factor Analysis

Figure 4 for Compressive Sensing via Convolutional Factor Analysis

We solve the compressive sensing problem via convolutional factor analysis, where the convolutional dictionaries are learned {\em in situ} from the compressed measurements. An alternating direction method of multipliers (ADMM) paradigm for compressive sensing inversion based on convolutional factor analysis is developed. The proposed algorithm provides reconstructed images as well as features, which can be directly used for recognition ($e.g.$, classification) tasks. When a deep (multilayer) model is constructed, a stochastic unpooling process is employed to build a generative model. During reconstruction and testing, we project the upper layer dictionary to the data level and only a single layer deconvolution is required. We demonstrate that using $\sim30\%$ (relative to pixel numbers) compressed measurements, the proposed model achieves the classification accuracy comparable to the original data on MNIST. We also observe that when the compressed measurements are very limited ($e.g.$, $<10\%$), the upper layer dictionary can provide better reconstruction results than the bottom layer.

* 17 pages, 6 figures

Via

Access Paper or Ask Questions