Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vivek Rathod

Bayesian Dark Knowledge

Nov 06, 2015

Anoop Korattikara, Vivek Rathod, Kevin Murphy, Max Welling

Abstract:We consider the problem of Bayesian parameter estimation for deep neural networks, which is important in problem settings where we may have little data, and/ or where we need accurate posterior predictive densities, e.g., for applications involving bandits or active learning. One simple approach to this is to use online Monte Carlo methods, such as SGLD (stochastic gradient Langevin dynamics). Unfortunately, such a method needs to store many copies of the parameters (which wastes memory), and needs to make predictions using many versions of the model (which wastes time). We describe a method for "distilling" a Monte Carlo approximation to the posterior predictive density into a more compact form, namely a single deep neural network. We compare to two very recent approaches to Bayesian neural networks, namely an approach based on expectation propagation [Hernandez-Lobato and Adams, 2015] and an approach based on variational Bayes [Blundell et al., 2015]. Our method performs better than both of these, is much simpler to implement, and uses less computation at test time.

* final version submitted to NIPS 2015

Via

Access Paper or Ask Questions

What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision

Mar 13, 2015

Jonathan Malmaud, Jonathan Huang, Vivek Rathod, Nick Johnston, Andrew Rabinovich, Kevin Murphy

Figure 1 for What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision

Figure 2 for What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision

Figure 3 for What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision

Figure 4 for What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision

Abstract:We present a novel method for aligning a sequence of instructions to a video of someone carrying out a task. In particular, we focus on the cooking domain, where the instructions correspond to the recipe. Our technique relies on an HMM to align the recipe steps to the (automatically generated) speech transcript. We then refine this alignment using a state-of-the-art visual food detector, based on a deep convolutional neural network. We show that our technique outperforms simpler techniques based on keyword spotting. It also enables interesting applications, such as automatically illustrating recipes with keyframes, and searching within a video for events of interest.

* To appear in NAACL 2015

Via

Access Paper or Ask Questions