Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lawrence Carin

NASH: Toward End-to-End Neural Architecture for Generative Semantic Hashing

May 14, 2018
Dinghan Shen, Qinliang Su, Paidamoyo Chapfuwa, Wenlin Wang, Guoyin Wang, Lawrence Carin, Ricardo Henao

Figure 1 for NASH: Toward End-to-End Neural Architecture for Generative Semantic Hashing

Figure 2 for NASH: Toward End-to-End Neural Architecture for Generative Semantic Hashing

Figure 3 for NASH: Toward End-to-End Neural Architecture for Generative Semantic Hashing

Figure 4 for NASH: Toward End-to-End Neural Architecture for Generative Semantic Hashing

Semantic hashing has become a powerful paradigm for fast similarity search in many information retrieval systems. While fairly successful, previous techniques generally require two-stage training, and the binary constraints are handled ad-hoc. In this paper, we present an end-to-end Neural Architecture for Semantic Hashing (NASH), where the binary hashing codes are treated as Bernoulli latent variables. A neural variational inference framework is proposed for training, where gradients are directly back-propagated through the discrete latent variable to optimize the hash function. We also draw connections between proposed method and rate-distortion theory, which provides a theoretical foundation for the effectiveness of the proposed framework. Experimental results on three public datasets demonstrate that our method significantly outperforms several state-of-the-art models on both unsupervised and supervised scenarios.

* To appear at ACL 2018

Via

Access Paper or Ask Questions

Joint Embedding of Words and Labels for Text Classification

May 10, 2018
Guoyin Wang, Chunyuan Li, Wenlin Wang, Yizhe Zhang, Dinghan Shen, Xinyuan Zhang, Ricardo Henao, Lawrence Carin

Figure 1 for Joint Embedding of Words and Labels for Text Classification

Figure 2 for Joint Embedding of Words and Labels for Text Classification

Figure 3 for Joint Embedding of Words and Labels for Text Classification

Figure 4 for Joint Embedding of Words and Labels for Text Classification

Word embeddings are effective intermediate representations for capturing semantic regularities between words, when learning the representations of text sequences. We propose to view text classification as a label-word joint embedding problem: each label is embedded in the same space with the word vectors. We introduce an attention framework that measures the compatibility of embeddings between text sequences and labels. The attention is learned on a training set of labeled samples to ensure that, given a text sequence, the relevant words are weighted higher than the irrelevant ones. Our method maintains the interpretability of word embeddings, and enjoys a built-in ability to leverage alternative sources of information, in addition to input text sequences. Extensive results on the several large text datasets show that the proposed framework outperforms the state-of-the-art methods by a large margin, in terms of both accuracy and speed.

* Published in ACL 2018; Code: https://github.com/guoyinwang/LEAM

Via

Access Paper or Ask Questions

Learning Structural Weight Uncertainty for Sequential Decision-Making

Apr 02, 2018
Ruiyi Zhang, Chunyuan Li, Changyou Chen, Lawrence Carin

Figure 1 for Learning Structural Weight Uncertainty for Sequential Decision-Making

Figure 2 for Learning Structural Weight Uncertainty for Sequential Decision-Making

Figure 3 for Learning Structural Weight Uncertainty for Sequential Decision-Making

Figure 4 for Learning Structural Weight Uncertainty for Sequential Decision-Making

Learning probability distributions on the weights of neural networks (NNs) has recently proven beneficial in many applications. Bayesian methods, such as Stein variational gradient descent (SVGD), offer an elegant framework to reason about NN model uncertainty. However, by assuming independent Gaussian priors for the individual NN weights (as often applied), SVGD does not impose prior knowledge that there is often structural information (dependence) among weights. We propose efficient posterior learning of structural weight uncertainty, within an SVGD framework, by employing matrix variate Gaussian priors on NN parameters. We further investigate the learned structural uncertainty in sequential decision-making problems, including contextual bandits and reinforcement learning. Experiments on several synthetic and real datasets indicate the superiority of our model, compared with state-of-the-art methods.

* Accepted by AISTATS 2018

Via

Access Paper or Ask Questions

Nonlocal Low-Rank Tensor Factor Analysis for Image Restoration

Mar 19, 2018
Xinyuan Zhang, Xin Yuan, Lawrence Carin

Figure 1 for Nonlocal Low-Rank Tensor Factor Analysis for Image Restoration

Figure 2 for Nonlocal Low-Rank Tensor Factor Analysis for Image Restoration

Figure 3 for Nonlocal Low-Rank Tensor Factor Analysis for Image Restoration

Figure 4 for Nonlocal Low-Rank Tensor Factor Analysis for Image Restoration

Low-rank signal modeling has been widely leveraged to capture non-local correlation in image processing applications. We propose a new method that employs low-rank tensor factor analysis for tensors generated by grouped image patches. The low-rank tensors are fed into the alternative direction multiplier method (ADMM) to further improve image reconstruction. The motivating application is compressive sensing (CS), and a deep convolutional architecture is adopted to approximate the expensive matrix inversion in CS applications. An iterative algorithm based on this low-rank tensor factorization strategy, called NLR-TFA, is presented in detail. Experimental results on noiseless and noisy CS measurements demonstrate the superiority of the proposed approach, especially at low CS sampling rates.

Via

Access Paper or Ask Questions

Topic Compositional Neural Language Model

Feb 26, 2018
Wenlin Wang, Zhe Gan, Wenqi Wang, Dinghan Shen, Jiaji Huang, Wei Ping, Sanjeev Satheesh, Lawrence Carin

Figure 1 for Topic Compositional Neural Language Model

Figure 2 for Topic Compositional Neural Language Model

Figure 3 for Topic Compositional Neural Language Model

Figure 4 for Topic Compositional Neural Language Model

We propose a Topic Compositional Neural Language Model (TCNLM), a novel method designed to simultaneously capture both the global semantic meaning and the local word ordering structure in a document. The TCNLM learns the global semantic coherence of a document via a neural topic model, and the probability of each learned latent topic is further used to build a Mixture-of-Experts (MoE) language model, where each expert (corresponding to one topic) is a recurrent neural network (RNN) that accounts for learning the local structure of a word sequence. In order to train the MoE model efficiently, a matrix factorization method is applied, by extending each weight matrix of the RNN to be an ensemble of topic-dependent weight matrices. The degree to which each member of the ensemble is used is tied to the document-dependent probability of the corresponding topics. Experimental results on several corpora show that the proposed approach outperforms both a pure RNN-based model and other topic-guided language models. Further, our model yields sensible topics, and also has the capacity to generate meaningful sentences conditioned on given topics.

* To appear in AISTATS 2018, updated version

Via

Access Paper or Ask Questions

Superposition-Assisted Stochastic Optimization for Hawkes Processes

Feb 14, 2018
Hongteng Xu, Xu Chen, Lawrence Carin

Figure 1 for Superposition-Assisted Stochastic Optimization for Hawkes Processes

Figure 2 for Superposition-Assisted Stochastic Optimization for Hawkes Processes

Figure 3 for Superposition-Assisted Stochastic Optimization for Hawkes Processes

Figure 4 for Superposition-Assisted Stochastic Optimization for Hawkes Processes

We consider the learning of multi-agent Hawkes processes, a model containing multiple Hawkes processes with shared endogenous impact functions and different exogenous intensities. In the framework of stochastic maximum likelihood estimation, we explore the associated risk bound. Further, we consider the superposition of Hawkes processes within the model, and demonstrate that under certain conditions such an operation is beneficial for tightening the risk bound. Accordingly, we propose a stochastic optimization algorithm assisted with a diversity-driven superposition strategy, achieving better learning results with improved convergence properties. The effectiveness of the proposed method is verified on synthetic data, and its potential to solve the cold-start problem of sequential recommendation systems is demonstrated on real-world data.

Via

Access Paper or Ask Questions

Benefits from Superposed Hawkes Processes

Feb 14, 2018
Hongteng Xu, Dixin Luo, Xu Chen, Lawrence Carin

Figure 1 for Benefits from Superposed Hawkes Processes

Figure 2 for Benefits from Superposed Hawkes Processes

Figure 3 for Benefits from Superposed Hawkes Processes

The superposition of temporal point processes has been studied for many years, although the usefulness of such models for practical applications has not be fully developed. We investigate superposed Hawkes process as an important class of such models, with properties studied in the framework of least squares estimation. The superposition of Hawkes processes is demonstrated to be beneficial for tightening the upper bound of excess risk under certain conditions, and we show the feasibility of the benefit in typical situations. The usefulness of superposed Hawkes processes is verified on synthetic data, and its potential to solve the cold-start problem of recommendation systems is demonstrated on real-world data.

* AISTATS 2018

Via

Access Paper or Ask Questions

Learning Registered Point Processes from Idiosyncratic Observations

Feb 13, 2018
Hongteng Xu, Lawrence Carin, Hongyuan Zha

Figure 1 for Learning Registered Point Processes from Idiosyncratic Observations

Figure 2 for Learning Registered Point Processes from Idiosyncratic Observations

Figure 3 for Learning Registered Point Processes from Idiosyncratic Observations

Figure 4 for Learning Registered Point Processes from Idiosyncratic Observations

A parametric point process model is developed, with modeling based on the assumption that sequential observations often share latent phenomena, while also possessing idiosyncratic effects. An alternating optimization method is proposed to learn a "registered" point process that accounts for shared structure, as well as "warping" functions that characterize idiosyncratic aspects of each observed sequence. Under reasonable constraints, in each iteration we update the sample-specific warping functions by solving a set of constrained nonlinear programming problems in parallel, and update the model by maximum likelihood estimation. The justifiability, complexity and robustness of the proposed method are investigated in detail, and the influence of sequence stitching on the learning results is examined empirically. Experiments on both synthetic and real-world data demonstrate that the method yields explainable point process models, achieving encouraging results compared to state-of-the-art methods.

Via

Access Paper or Ask Questions

Stochastic Gradient Monomial Gamma Sampler

Jan 10, 2018
Yizhe Zhang, Changyou Chen, Zhe Gan, Ricardo Henao, Lawrence Carin

Figure 1 for Stochastic Gradient Monomial Gamma Sampler

Figure 2 for Stochastic Gradient Monomial Gamma Sampler

Figure 3 for Stochastic Gradient Monomial Gamma Sampler

Figure 4 for Stochastic Gradient Monomial Gamma Sampler

Recent advances in stochastic gradient techniques have made it possible to estimate posterior distributions from large datasets via Markov Chain Monte Carlo (MCMC). However, when the target posterior is multimodal, mixing performance is often poor. This results in inadequate exploration of the posterior distribution. A framework is proposed to improve the sampling efficiency of stochastic gradient MCMC, based on Hamiltonian Monte Carlo. A generalized kinetic function is leveraged, delivering superior stationary mixing, especially for multimodal distributions. Techniques are also discussed to overcome the practical issues introduced by this generalization. It is shown that the proposed approach is better at exploring complex multimodal posterior distributions, as demonstrated on multiple applications and in comparison with other stochastic gradient MCMC methods.

* Proceedings of the 34th International Conference on Machine Learning, PMLR 70:3996-4005, 2017
* Published on ICML 2017

Via

Access Paper or Ask Questions

Towards Unifying Hamiltonian Monte Carlo and Slice Sampling

Jan 10, 2018
Yizhe Zhang, Xiangyu Wang, Changyou Chen, Ricardo Henao, Kai Fan, Lawrence Carin

Figure 1 for Towards Unifying Hamiltonian Monte Carlo and Slice Sampling

Figure 2 for Towards Unifying Hamiltonian Monte Carlo and Slice Sampling

Figure 3 for Towards Unifying Hamiltonian Monte Carlo and Slice Sampling

We unify slice sampling and Hamiltonian Monte Carlo (HMC) sampling, demonstrating their connection via the Hamiltonian-Jacobi equation from Hamiltonian mechanics. This insight enables extension of HMC and slice sampling to a broader family of samplers, called Monomial Gamma Samplers (MGS). We provide a theoretical analysis of the mixing performance of such samplers, proving that in the limit of a single parameter, the MGS draws decorrelated samples from the desired target distribution. We further show that as this parameter tends toward this limit, performance gains are achieved at a cost of increasing numerical difficulty and some practical convergence issues. Our theoretical results are validated with synthetic data and real-world applications.

* Advances in Neural Information Processing Systems, pages 1741--1749, year 2016
* updated version

Via

Access Paper or Ask Questions