Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zichao Yang

Neural Machine Translation with Recurrent Attention Modeling

Jul 18, 2016

Zichao Yang, Zhiting Hu, Yuntian Deng, Chris Dyer, Alex Smola

Figure 1 for Neural Machine Translation with Recurrent Attention Modeling

Figure 2 for Neural Machine Translation with Recurrent Attention Modeling

Figure 3 for Neural Machine Translation with Recurrent Attention Modeling

Figure 4 for Neural Machine Translation with Recurrent Attention Modeling

Abstract:Knowing which words have been attended to in previous time steps while generating a translation is a rich source of information for predicting what words will be attended to in the future. We improve upon the attention model of Bahdanau et al. (2014) by explicitly modeling the relationship between previous and subsequent attention levels for each word using one recurrent network per input word. This architecture easily captures informative features, such as fertility and regularities in relative distortion. In experiments, we show our parameterization of attention improves translation quality.

Via

Access Paper or Ask Questions

Stacked Attention Networks for Image Question Answering

Jan 26, 2016

Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Smola

Figure 1 for Stacked Attention Networks for Image Question Answering

Figure 2 for Stacked Attention Networks for Image Question Answering

Figure 3 for Stacked Attention Networks for Image Question Answering

Figure 4 for Stacked Attention Networks for Image Question Answering

Abstract:This paper presents stacked attention networks (SANs) that learn to answer natural language questions from images. SANs use semantic representation of a question as query to search for the regions in an image that are related to the answer. We argue that image question answering (QA) often requires multiple steps of reasoning. Thus, we develop a multiple-layer SAN in which we query an image multiple times to infer the answer progressively. Experiments conducted on four image QA data sets demonstrate that the proposed SANs significantly outperform previous state-of-the-art approaches. The visualization of the attention layers illustrates the progress that the SAN locates the relevant visual clues that lead to the answer of the question layer-by-layer.

* test-dev/standard results added

Via

Access Paper or Ask Questions

Deep Fried Convnets

Jul 17, 2015

Zichao Yang, Marcin Moczulski, Misha Denil, Nando de Freitas, Alex Smola, Le Song, Ziyu Wang

Abstract:The fully connected layers of a deep convolutional neural network typically contain over 90% of the network parameters, and consume the majority of the memory required to store the network parameters. Reducing the number of parameters while preserving essentially the same predictive performance is critically important for operating deep neural networks in memory constrained environments such as GPUs or embedded devices. In this paper we show how kernel methods, in particular a single Fastfood layer, can be used to replace all fully connected layers in a deep convolutional neural network. This novel Fastfood layer is also end-to-end trainable in conjunction with convolutional layers, allowing us to combine them into a new architecture, named deep fried convolutional networks, which substantially reduces the memory footprint of convolutional networks trained on MNIST and ImageNet with no drop in predictive performance.

* svd experiments included

Via

Access Paper or Ask Questions

A la Carte - Learning Fast Kernels

Dec 19, 2014

Zichao Yang, Alexander J. Smola, Le Song, Andrew Gordon Wilson

Figure 1 for A la Carte - Learning Fast Kernels

Figure 2 for A la Carte - Learning Fast Kernels

Figure 3 for A la Carte - Learning Fast Kernels

Figure 4 for A la Carte - Learning Fast Kernels

Abstract:Kernel methods have great promise for learning rich statistical representations of large modern datasets. However, compared to neural networks, kernel methods have been perceived as lacking in scalability and flexibility. We introduce a family of fast, flexible, lightly parametrized and general purpose kernel learning methods, derived from Fastfood basis function expansions. We provide mechanisms to learn the properties of groups of spectral frequencies in these expansions, which require only O(mlogd) time and O(m) memory, for m basis functions and d input dimensions. We show that the proposed methods can learn a wide class of kernels, outperforming the alternatives in accuracy, speed, and memory consumption.

Via

Access Paper or Ask Questions