Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chris Dyer

Multitask Learning with CTC and Segmental CRF for Speech Recognition

Jun 05, 2017
Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith

Figure 1 for Multitask Learning with CTC and Segmental CRF for Speech Recognition

Figure 2 for Multitask Learning with CTC and Segmental CRF for Speech Recognition

Figure 3 for Multitask Learning with CTC and Segmental CRF for Speech Recognition

Figure 4 for Multitask Learning with CTC and Segmental CRF for Speech Recognition

Segmental conditional random fields (SCRFs) and connectionist temporal classification (CTC) are two sequence labeling methods used for end-to-end training of speech recognition models. Both models define a transcription probability by marginalizing decisions about latent segmentation alternatives to derive a sequence probability: the former uses a globally normalized joint model of segment labels and durations, and the latter classifies each frame as either an output symbol or a "continuation" of the previous label. In this paper, we train a recognition model by optimizing an interpolation between the SCRF and CTC losses, where the same recurrent neural network (RNN) encoder is used for feature extraction for both outputs. We find that this multitask objective improves recognition accuracy when decoding with either the SCRF or CTC models. Additionally, we show that CTC can also be used to pretrain the RNN encoder, which improves the convergence rate when learning the joint model.

* 5 pages, 2 figures, camera ready version at Interspeech 2017

Via

Access Paper or Ask Questions

Generative and Discriminative Text Classification with Recurrent Neural Networks

May 26, 2017
Dani Yogatama, Chris Dyer, Wang Ling, Phil Blunsom

Figure 1 for Generative and Discriminative Text Classification with Recurrent Neural Networks

Figure 2 for Generative and Discriminative Text Classification with Recurrent Neural Networks

Figure 3 for Generative and Discriminative Text Classification with Recurrent Neural Networks

Figure 4 for Generative and Discriminative Text Classification with Recurrent Neural Networks

We empirically characterize the performance of discriminative and generative LSTM models for text classification. We find that although RNN-based generative models are more powerful than their bag-of-words ancestors (e.g., they account for conditional dependencies across words in a document), they have higher asymptotic error rates than discriminatively trained RNN models. However we also find that generative models approach their asymptotic error rate more rapidly than their discriminative counterparts---the same pattern that Ng & Jordan (2001) proved holds for linear classification models that make more naive conditional independence assumptions. Building on this finding, we hypothesize that RNN-based generative classification models will be more robust to shifts in the data distribution. This hypothesis is confirmed in a series of experiments in zero-shot and continual learning settings that show that generative models substantially outperform discriminative models.

Via

Access Paper or Ask Questions

On-the-fly Operation Batching in Dynamic Computation Graphs

May 22, 2017
Graham Neubig, Yoav Goldberg, Chris Dyer

Figure 1 for On-the-fly Operation Batching in Dynamic Computation Graphs

Figure 2 for On-the-fly Operation Batching in Dynamic Computation Graphs

Figure 3 for On-the-fly Operation Batching in Dynamic Computation Graphs

Figure 4 for On-the-fly Operation Batching in Dynamic Computation Graphs

Dynamic neural network toolkits such as PyTorch, DyNet, and Chainer offer more flexibility for implementing models that cope with data of varying dimensions and structure, relative to toolkits that operate on statically declared computations (e.g., TensorFlow, CNTK, and Theano). However, existing toolkits - both static and dynamic - require that the developer organize the computations into the batches necessary for exploiting high-performance algorithms and hardware. This batching task is generally difficult, but it becomes a major hurdle as architectures become complex. In this paper, we present an algorithm, and its implementation in the DyNet toolkit, for automatically batching operations. Developers simply write minibatch computations as aggregations of single instance computations, and the batching algorithm seamlessly executes them, on the fly, using computationally efficient batched operations. On a variety of tasks, we obtain throughput similar to that obtained with manual batches, as well as comparable speedups over single-instance learning on architectures that are impractical to batch manually.

Via

Access Paper or Ask Questions

Ontology-Aware Token Embeddings for Prepositional Phrase Attachment

May 08, 2017
Pradeep Dasigi, Waleed Ammar, Chris Dyer, Eduard Hovy

Figure 1 for Ontology-Aware Token Embeddings for Prepositional Phrase Attachment

Figure 2 for Ontology-Aware Token Embeddings for Prepositional Phrase Attachment

Figure 3 for Ontology-Aware Token Embeddings for Prepositional Phrase Attachment

Figure 4 for Ontology-Aware Token Embeddings for Prepositional Phrase Attachment

Type-level word embeddings use the same set of parameters to represent all instances of a word regardless of its context, ignoring the inherent lexical ambiguity in language. Instead, we embed semantic concepts (or synsets) as defined in WordNet and represent a word token in a particular context by estimating a distribution over relevant semantic concepts. We use the new, context-sensitive embeddings in a model for predicting prepositional phrase(PP) attachments and jointly learn the concept embeddings and model parameters. We show that using context-sensitive embeddings improves the accuracy of the PP attachment model by 5.4% absolute points, which amounts to a 34.4% relative reduction in errors.

* ACL 2017

Via

Access Paper or Ask Questions

Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling

Apr 23, 2017
Kazuya Kawakami, Chris Dyer, Phil Blunsom

Figure 1 for Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling

Figure 2 for Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling

Figure 3 for Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling

Figure 4 for Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling

Fixed-vocabulary language models fail to account for one of the most characteristic statistical facts of natural language: the frequent creation and reuse of new word types. Although character-level language models offer a partial solution in that they can create word types not attested in the training corpus, they do not capture the "bursty" distribution of such words. In this paper, we augment a hierarchical LSTM language model that generates sequences of word tokens character by character with a caching mechanism that learns to reuse previously generated words. To validate our model we construct a new open-vocabulary language modeling corpus (the Multilingual Wikipedia Corpus, MWC) from comparable Wikipedia articles in 7 typologically diverse languages and demonstrate the effectiveness of our model across this range of languages.

* ACL 2017

Via

Access Paper or Ask Questions

Differentiable Scheduled Sampling for Credit Assignment

Apr 23, 2017
Kartik Goyal, Chris Dyer, Taylor Berg-Kirkpatrick

Figure 1 for Differentiable Scheduled Sampling for Credit Assignment

Figure 2 for Differentiable Scheduled Sampling for Credit Assignment

Figure 3 for Differentiable Scheduled Sampling for Credit Assignment

Figure 4 for Differentiable Scheduled Sampling for Credit Assignment

We demonstrate that a continuous relaxation of the argmax operation can be used to create a differentiable approximation to greedy decoding for sequence-to-sequence (seq2seq) models. By incorporating this approximation into the scheduled sampling training procedure (Bengio et al., 2015)--a well-known technique for correcting exposure bias--we introduce a new training objective that is continuous and differentiable everywhere and that can provide informative gradients near points where previous decoding decisions change their value. In addition, by using a related approximation, we demonstrate a similar approach to sampled-based training. Finally, we show that our approach outperforms cross-entropy training and scheduled sampling procedures in two sequence prediction tasks: named entity recognition and machine translation.

* Accepted at ACL2017 (http://bit.ly/2oj1muX)

Via

Access Paper or Ask Questions

The Neural Noisy Channel

Mar 06, 2017
Lei Yu, Phil Blunsom, Chris Dyer, Edward Grefenstette, Tomas Kocisky

We formulate sequence to sequence transduction as a noisy channel decoding problem and use recurrent neural networks to parameterise the source and channel models. Unlike direct models which can suffer from explaining-away effects during training, noisy channel models must produce outputs that explain their inputs, and their component models can be trained with not only paired training samples but also unpaired samples from the marginal output distribution. Using a latent variable to control how much of the conditioning sequence the channel model needs to read in order to generate a subsequent symbol, we obtain a tractable and effective beam search decoder. Experimental results on abstractive sentence summarisation, morphological inflection, and machine translation show that noisy channel models outperform direct models, and that they significantly benefit from increased amounts of unpaired output data that direct models cannot easily use.

* ICLR 2017

Via

Access Paper or Ask Questions

DyNet: The Dynamic Neural Network Toolkit

Jan 15, 2017
Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin

Figure 1 for DyNet: The Dynamic Neural Network Toolkit

Figure 2 for DyNet: The Dynamic Neural Network Toolkit

Figure 3 for DyNet: The Dynamic Neural Network Toolkit

Figure 4 for DyNet: The Dynamic Neural Network Toolkit

We describe DyNet, a toolkit for implementing neural network models based on dynamic declaration of network structure. In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives. In DyNet's dynamic declaration strategy, computation graph construction is mostly transparent, being implicitly constructed by executing procedural code that computes the network outputs, and the user is free to use different network structures for each input. Dynamic declaration thus facilitates the implementation of more complicated network architectures, and DyNet is specifically designed to allow users to implement their models in a way that is idiomatic in their preferred programming language (C++ or Python). One challenge with dynamic declaration is that because the symbolic computation graph is defined anew for every training example, its construction must have low overhead. To achieve this, DyNet has an optimized C++ backend and lightweight graph representation. Experiments show that DyNet's speeds are faster than or comparable with static declaration toolkits, and significantly faster than Chainer, another dynamic declaration toolkit. DyNet is released open-source under the Apache 2.0 license and available at http://github.com/clab/dynet.

* 33 pages

Via

Access Paper or Ask Questions

What Do Recurrent Neural Network Grammars Learn About Syntax?

Jan 10, 2017
Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer, Graham Neubig, Noah A. Smith

Figure 1 for What Do Recurrent Neural Network Grammars Learn About Syntax?

Figure 2 for What Do Recurrent Neural Network Grammars Learn About Syntax?

Figure 3 for What Do Recurrent Neural Network Grammars Learn About Syntax?

Figure 4 for What Do Recurrent Neural Network Grammars Learn About Syntax?

Recurrent neural network grammars (RNNG) are a recently proposed probabilistic generative modeling family for natural language. They show state-of-the-art language modeling and parsing performance. We investigate what information they learn, from a linguistic perspective, through various ablations to the model and the data, and by augmenting the model with an attention mechanism (GA-RNNG) to enable closer inspection. We find that explicit modeling of composition is crucial for achieving the best performance. Through the attention mechanism, we find that headedness plays a central role in phrasal representation (with the model's latent attention largely agreeing with predictions made by hand-crafted head rules, albeit with some important differences). By training grammars without nonterminal labels, we find that phrasal representations depend minimally on nonterminals, providing support for the endocentricity hypothesis.

* 10 pages. To appear in EACL 2017, Valencia, Spain

Via

Access Paper or Ask Questions

Learning to Compose Words into Sentences with Reinforcement Learning

Nov 28, 2016
Dani Yogatama, Phil Blunsom, Chris Dyer, Edward Grefenstette, Wang Ling

Figure 1 for Learning to Compose Words into Sentences with Reinforcement Learning

Figure 2 for Learning to Compose Words into Sentences with Reinforcement Learning

Figure 3 for Learning to Compose Words into Sentences with Reinforcement Learning

Figure 4 for Learning to Compose Words into Sentences with Reinforcement Learning

We use reinforcement learning to learn tree-structured neural networks for computing representations of natural language sentences. In contrast with prior work on tree-structured models in which the trees are either provided as input or predicted using supervision from explicit treebank annotations, the tree structures in this work are optimized to improve performance on a downstream task. Experiments demonstrate the benefit of learning task-specific composition orders, outperforming both sequential encoders and recursive encoders based on treebank annotations. We analyze the induced trees and show that while they discover some linguistically intuitive structures (e.g., noun phrases, simple verb phrases), they are different than conventional English syntactic structures.

Via

Access Paper or Ask Questions