Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nando de Freitas

A Deep Architecture for Semantic Parsing

Apr 29, 2014
Edward Grefenstette, Phil Blunsom, Nando de Freitas, Karl Moritz Hermann

Figure 1 for A Deep Architecture for Semantic Parsing

Figure 2 for A Deep Architecture for Semantic Parsing

Figure 3 for A Deep Architecture for Semantic Parsing

Figure 4 for A Deep Architecture for Semantic Parsing

Many successful approaches to semantic parsing build on top of the syntactic analysis of text, and make use of distributional representations or statistical models to match parses to ontology-specific queries. This paper presents a novel deep learning architecture which provides a semantic parsing system through the union of two neural models of language semantics. It allows for the generation of ontology-specific queries from natural language statements and questions without the need for parsing, which makes it especially suitable to grammatically malformed or syntactically atypical text, such as tweets, as well as permitting the development of semantic parsers for resource-poor languages.

* In Proceedings of the Semantic Parsing Workshop at ACL 2014 (forthcoming)

Via

Access Paper or Ask Questions

Bayesian Multi-Scale Optimistic Optimization

Feb 27, 2014
Ziyu Wang, Babak Shakibi, Lin Jin, Nando de Freitas

Figure 1 for Bayesian Multi-Scale Optimistic Optimization

Figure 2 for Bayesian Multi-Scale Optimistic Optimization

Figure 3 for Bayesian Multi-Scale Optimistic Optimization

Figure 4 for Bayesian Multi-Scale Optimistic Optimization

Bayesian optimization is a powerful global optimization technique for expensive black-box functions. One of its shortcomings is that it requires auxiliary optimization of an acquisition function at each iteration. This auxiliary optimization can be costly and very hard to carry out in practice. Moreover, it creates serious theoretical concerns, as most of the convergence results assume that the exact optimum of the acquisition function can be found. In this paper, we introduce a new technique for efficient global optimization that combines Gaussian process confidence bounds and treed simultaneous optimistic optimization to eliminate the need for auxiliary optimization of acquisition functions. The experiments with global optimization benchmarks and a novel application to automatic information extraction demonstrate that the resulting technique is more efficient than the two approaches from which it draws inspiration. Unlike most theoretical analyses of Bayesian optimization with Gaussian processes, our finite-time convergence rate proofs do not require exact optimization of an acquisition function. That is, our approach eliminates the unsatisfactory assumption that a difficult, potentially NP-hard, problem has to be solved in order to obtain vanishing regret rates.

* 15 pages

Via

Access Paper or Ask Questions

Linear and Parallel Learning of Markov Random Fields

Feb 05, 2014
Yariv Dror Mizrahi, Misha Denil, Nando de Freitas

Figure 1 for Linear and Parallel Learning of Markov Random Fields

Figure 2 for Linear and Parallel Learning of Markov Random Fields

Figure 3 for Linear and Parallel Learning of Markov Random Fields

Figure 4 for Linear and Parallel Learning of Markov Random Fields

We introduce a new embarrassingly parallel parameter learning algorithm for Markov random fields with untied parameters which is efficient for a large class of practical models. Our algorithm parallelizes naturally over cliques and, for graphs of bounded degree, its complexity is linear in the number of cliques. Unlike its competitors, our algorithm is fully parallel and for log-linear models it is also data efficient, requiring only the local sufficient statistics of the data to estimate parameters.

Via

Access Paper or Ask Questions

Exploiting correlation and budget constraints in Bayesian multi-armed bandit optimization

Nov 11, 2013
Matthew W. Hoffman, Bobak Shahriari, Nando de Freitas

Figure 1 for Exploiting correlation and budget constraints in Bayesian multi-armed bandit optimization

Figure 2 for Exploiting correlation and budget constraints in Bayesian multi-armed bandit optimization

Figure 3 for Exploiting correlation and budget constraints in Bayesian multi-armed bandit optimization

Figure 4 for Exploiting correlation and budget constraints in Bayesian multi-armed bandit optimization

We address the problem of finding the maximizer of a nonlinear smooth function, that can only be evaluated point-wise, subject to constraints on the number of permitted function evaluations. This problem is also known as fixed-budget best arm identification in the multi-armed bandit literature. We introduce a Bayesian approach for this problem and show that it empirically outperforms both the existing frequentist counterpart and other Bayesian optimization methods. The Bayesian approach places emphasis on detailed modelling, including the modelling of correlations among the arms. As a result, it can perform well in situations where the number of arms is much larger than the number of allowed function evaluation, whereas the frequentist counterpart is inapplicable. This feature enables us to develop and deploy practical applications, such as automatic machine learning toolboxes. The paper presents comprehensive comparisons of the proposed approach, Thompson sampling, classical Bayesian optimization techniques, more recent Bayesian bandit approaches, and state-of-the-art best arm identification methods. This is the first comparison of many of these methods in the literature and allows us to examine the relative merits of their different features.

Via

Access Paper or Ask Questions

Narrowing the Gap: Random Forests In Theory and In Practice

Oct 04, 2013
Misha Denil, David Matheson, Nando de Freitas

Figure 1 for Narrowing the Gap: Random Forests In Theory and In Practice

Figure 2 for Narrowing the Gap: Random Forests In Theory and In Practice

Figure 3 for Narrowing the Gap: Random Forests In Theory and In Practice

Figure 4 for Narrowing the Gap: Random Forests In Theory and In Practice

Despite widespread interest and practical use, the theoretical properties of random forests are still not well understood. In this paper we contribute to this understanding in two ways. We present a new theoretically tractable variant of random regression forests and prove that our algorithm is consistent. We also provide an empirical evaluation, comparing our algorithm and other theoretically tractable random forest models to the random forest algorithm used in practice. Our experiments provide insight into the relative importance of different simplifications that theoreticians have made to obtain tractable models for analysis.

* Under review by the International Conference on Machine Learning (ICML) 2014

Via

Access Paper or Ask Questions

Consistency of Online Random Forests

May 08, 2013
Misha Denil, David Matheson, Nando de Freitas

Figure 1 for Consistency of Online Random Forests

Figure 2 for Consistency of Online Random Forests

Figure 3 for Consistency of Online Random Forests

Figure 4 for Consistency of Online Random Forests

As a testament to their success, the theory of random forests has long been outpaced by their application in practice. In this paper, we take a step towards narrowing this gap by providing a consistency result for online random forests.

* To appear in Proceedings of the 30th International Conference on Machine Learning, 2013

Via

Access Paper or Ask Questions

Herded Gibbs Sampling

Mar 16, 2013
Luke Bornn, Yutian Chen, Nando de Freitas, Mareija Eskelin, Jing Fang, Max Welling

The Gibbs sampler is one of the most popular algorithms for inference in statistical models. In this paper, we introduce a herding variant of this algorithm, called herded Gibbs, that is entirely deterministic. We prove that herded Gibbs has an $O(1/T)$ convergence rate for models with independent variables and for fully connected probabilistic graphical models. Herded Gibbs is shown to outperform Gibbs in the tasks of image denoising with MRFs and named entity recognition with CRFs. However, the convergence for herded Gibbs for sparsely connected probabilistic graphical models is still an open problem.

* 19 pages, including the appendix. Submission for ICLR 2013

Via

Access Paper or Ask Questions

Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks

Jan 16, 2013
Arnaud Doucet, Nando de Freitas, Kevin Murphy, Stuart Russell

Figure 1 for Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks

Figure 2 for Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks

Figure 3 for Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks

Figure 4 for Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks

Particle filters (PFs) are powerful sampling-based inference/learning algorithms for dynamic Bayesian networks (DBNs). They allow us to treat, in a principled way, any type of probability distribution, nonlinearity and non-stationarity. They have appeared in several fields under such names as "condensation", "sequential Monte Carlo" and "survival of the fittest". In this paper, we show how we can exploit the structure of the DBN to increase the efficiency of particle filtering, using a technique known as Rao-Blackwellisation. Essentially, this samples some of the variables, and marginalizes out the rest exactly, using the Kalman filter, HMM filter, junction tree algorithm, or any other finite dimensional optimal filter. We show that Rao-Blackwellised particle filters (RBPFs) lead to more accurate estimates than standard PFs. We demonstrate RBPFs on two problems, namely non-stationary online regression with radial basis function networks and robot localization and map building. We also discuss other potential application areas and provide references to some finite dimensional optimal filters.

* Appears in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI2000)

Via

Access Paper or Ask Questions

Reversible Jump MCMC Simulated Annealing for Neural Networks

Jan 16, 2013
Christophe Andrieu, Nando de Freitas, Arnaud Doucet

Figure 1 for Reversible Jump MCMC Simulated Annealing for Neural Networks

Figure 2 for Reversible Jump MCMC Simulated Annealing for Neural Networks

We propose a novel reversible jump Markov chain Monte Carlo (MCMC) simulated annealing algorithm to optimize radial basis function (RBF) networks. This algorithm enables us to maximize the joint posterior distribution of the network parameters and the number of basis functions. It performs a global search in the joint space of the parameters and number of parameters, thereby surmounting the problem of local minima. We also show that by calibrating a Bayesian model, we can obtain the classical AIC, BIC and MDL model selection criteria within a penalized likelihood framework. Finally, we show theoretically and empirically that the algorithm converges to the modes of the full posterior distribution in an efficient way.

* Appears in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI2000)

Via

Access Paper or Ask Questions

Variational MCMC

Jan 10, 2013
Nando de Freitas, Pedro Hojen-Sorensen, Michael I. Jordan, Stuart Russell

We propose a new class of learning algorithms that combines variational approximation and Markov chain Monte Carlo (MCMC) simulation. Naive algorithms that use the variational approximation as proposal distribution can perform poorly because this approximation tends to underestimate the true variance and other features of the data. We solve this problem by introducing more sophisticated MCMC algorithms. One of these algorithms is a mixture of two MCMC kernels: a random walk Metropolis kernel and a blockMetropolis-Hastings (MH) kernel with a variational approximation as proposaldistribution. The MH kernel allows one to locate regions of high probability efficiently. The Metropolis kernel allows us to explore the vicinity of these regions. This algorithm outperforms variationalapproximations because it yields slightly better estimates of the mean and considerably better estimates of higher moments, such as covariances. It also outperforms standard MCMC algorithms because it locates theregions of high probability quickly, thus speeding up convergence. We demonstrate this algorithm on the problem of Bayesian parameter estimation for logistic (sigmoid) belief networks.

* Appears in Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI2001)

Via

Access Paper or Ask Questions