Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liang Huang

Member, IEEE

When to Finish? Optimal Beam Search for Neural Text Generation

Aug 31, 2018

Liang Huang, Kai Zhao, Mingbo Ma

Figure 1 for When to Finish? Optimal Beam Search for Neural Text Generation

Figure 2 for When to Finish? Optimal Beam Search for Neural Text Generation

Figure 3 for When to Finish? Optimal Beam Search for Neural Text Generation

Figure 4 for When to Finish? Optimal Beam Search for Neural Text Generation

Abstract:In neural text generation such as neural machine translation, summarization, and image captioning, beam search is widely used to improve the output text quality. However, in the neural generation setting, hypotheses can finish in different steps, which makes it difficult to decide when to end beam search to ensure optimality. We propose a provably optimal beam search algorithm that will always return the optimal-score complete hypothesis (modulo beam size), and finish as soon as the optimality is established (finishing no later than the baseline). To counter neural generation's tendency for shorter hypotheses, we also introduce a bounded length reward mechanism which allows a modified version of our beam search algorithm to remain optimal. Experiments on neural machine translation demonstrate that our principled beam search algorithm leads to improvement in BLEU score over previously proposed alternatives.

* accepted by EMNLP 2017

Via

Access Paper or Ask Questions

Ensemble Sequence Level Training for Multimodal MT: OSU-Baidu WMT18 Multimodal Machine Translation System Report

Aug 31, 2018

Renjie Zheng, Yilin Yang, Mingbo Ma, Liang Huang

Figure 1 for Ensemble Sequence Level Training for Multimodal MT: OSU-Baidu WMT18 Multimodal Machine Translation System Report

Figure 2 for Ensemble Sequence Level Training for Multimodal MT: OSU-Baidu WMT18 Multimodal Machine Translation System Report

Figure 3 for Ensemble Sequence Level Training for Multimodal MT: OSU-Baidu WMT18 Multimodal Machine Translation System Report

Figure 4 for Ensemble Sequence Level Training for Multimodal MT: OSU-Baidu WMT18 Multimodal Machine Translation System Report

Abstract:This paper describes multimodal machine translation systems developed jointly by Oregon State University and Baidu Research for WMT 2018 Shared Task on multimodal translation. In this paper, we introduce a simple approach to incorporate image information by feeding image features to the decoder side. We also explore different sequence level training methods including scheduled sampling and reinforcement learning which lead to substantial improvements. Our systems ensemble several models using different architectures and training methods and achieve the best performance for three subtasks: En-De and En-Cs in task 1 and (En+De+Fr)-Cs task 1B.

* Published in WMT 2018
* 5 pages

Via

Access Paper or Ask Questions

Multi-Reference Training with Pseudo-References for Neural Translation and Text Generation

Aug 28, 2018

Renjie Zheng, Mingbo Ma, Liang Huang

Figure 1 for Multi-Reference Training with Pseudo-References for Neural Translation and Text Generation

Figure 2 for Multi-Reference Training with Pseudo-References for Neural Translation and Text Generation

Figure 3 for Multi-Reference Training with Pseudo-References for Neural Translation and Text Generation

Figure 4 for Multi-Reference Training with Pseudo-References for Neural Translation and Text Generation

Abstract:Neural text generation, including neural machine translation, image captioning, and summarization, has been quite successful recently. However, during training time, typically only one reference is considered for each example, even though there are often multiple references available, e.g., 4 references in NIST MT evaluations, and 5 references in image captioning data. We first investigate several different ways of utilizing multiple human references during training. But more importantly, we then propose an algorithm to generate exponentially many pseudo-references by first compressing existing human references into lattices and then traversing them to generate new pseudo-references. These approaches lead to substantial improvements over strong baselines in both machine translation (+1.5 BLEU) and image captioning (+3.1 BLEU / +11.7 CIDEr).

* Published in EMNLP 2018
* 10 pages

Via

Access Paper or Ask Questions

Large Margin Neural Language Model

Aug 27, 2018

Jiaji Huang, Yi Li, Wei Ping, Liang Huang

Figure 1 for Large Margin Neural Language Model

Figure 2 for Large Margin Neural Language Model

Figure 3 for Large Margin Neural Language Model

Figure 4 for Large Margin Neural Language Model

Abstract:We propose a large margin criterion for training neural language models. Conventionally, neural language models are trained by minimizing perplexity (PPL) on grammatical sentences. However, we demonstrate that PPL may not be the best metric to optimize in some tasks, and further propose a large margin formulation. The proposed method aims to enlarge the margin between the "good" and "bad" sentences in a task-specific sense. It is trained end-to-end and can be widely applied to tasks that involve re-scoring of generated text. Compared with minimum-PPL training, our method gains up to 1.1 WER reduction for speech recognition and 1.0 BLEU increase for machine translation.

* 9 pages. Accepted as a long paper in EMNLP2018

Via

Access Paper or Ask Questions

Linear-Time Constituency Parsing with RNNs and Dynamic Programming

May 21, 2018

Juneki Hong, Liang Huang

Figure 1 for Linear-Time Constituency Parsing with RNNs and Dynamic Programming

Figure 2 for Linear-Time Constituency Parsing with RNNs and Dynamic Programming

Figure 3 for Linear-Time Constituency Parsing with RNNs and Dynamic Programming

Figure 4 for Linear-Time Constituency Parsing with RNNs and Dynamic Programming

Abstract:Recently, span-based constituency parsing has achieved competitive accuracies with extremely simple models by using bidirectional RNNs to model "spans". However, the minimal span parser of Stern et al (2017a) which holds the current state of the art accuracy is a chart parser running in cubic time, $O(n^3)$, which is too slow for longer sentences and for applications beyond sentence boundaries such as end-to-end discourse parsing and joint sentence boundary detection and parsing. We propose a linear-time constituency parser with RNNs and dynamic programming using graph-structured stack and beam search, which runs in time $O(n b^2)$ where $b$ is the beam size. We further speed this up to $O(n b\log b)$ by integrating cube pruning. Compared with chart parsing baselines, this linear-time parser is substantially faster for long sentences on the Penn Treebank and orders of magnitude faster for discourse parsing, and achieves the highest F1 accuracy on the Penn Treebank among single model end-to-end systems.

* Needs revisions, especially in the experiments section, discussing which models are WSJ-only, single-model, end-to-end

Via

Access Paper or Ask Questions

Event Nugget Detection with Forward-Backward Recurrent Neural Networks

Feb 15, 2018

Reza Ghaeini, Xiaoli Z. Fern, Liang Huang, Prasad Tadepalli

Figure 1 for Event Nugget Detection with Forward-Backward Recurrent Neural Networks

Figure 2 for Event Nugget Detection with Forward-Backward Recurrent Neural Networks

Figure 3 for Event Nugget Detection with Forward-Backward Recurrent Neural Networks

Figure 4 for Event Nugget Detection with Forward-Backward Recurrent Neural Networks

Abstract:Traditional event detection methods heavily rely on manually engineered rich features. Recent deep learning approaches alleviate this problem by automatic feature engineering. But such efforts, like tradition methods, have so far only focused on single-token event mentions, whereas in practice events can also be a phrase. We instead use forward-backward recurrent neural networks (FBRNNs) to detect events that can be either words or phrases. To the best our knowledge, this is one of the first efforts to handle multi-word events and also the first attempt to use RNNs for event detection. Experimental results demonstrate that FBRNN is competitive with the state-of-the-art methods on the ACE 2005 and the Rich ERE 2015 event detection tasks.

* http://www.aclweb.org/anthology/P16-2060
* Published as a short paper at ACL 2016. The main purpose of this submission is to add this paper to arxiv

Via

Access Paper or Ask Questions

OSU Multimodal Machine Translation System Report

Dec 14, 2017

Mingbo Ma, Dapeng Li, Kai Zhao, Liang Huang

Figure 1 for OSU Multimodal Machine Translation System Report

Figure 2 for OSU Multimodal Machine Translation System Report

Figure 3 for OSU Multimodal Machine Translation System Report

Figure 4 for OSU Multimodal Machine Translation System Report

Abstract:This paper describes Oregon State University's submissions to the shared WMT'17 task "multimodal translation task I". In this task, all the sentence pairs are image captions in different languages. The key difference between this task and conventional machine translation is that we have corresponding images as additional information for each sentence pair. In this paper, we introduce a simple but effective system which takes an image shared between different languages, feeding it into the both encoding and decoding side. We report our system's performance for English-French and English-German with Flickr30K (in-domain) and MSCOCO (out-of-domain) datasets. Our system achieves the best performance in TER for English-German for MSCOCO dataset.

* 5, WMT 2017

Via

Access Paper or Ask Questions

Group Sparse CNNs for Question Classification with Answer Sets

Oct 07, 2017

Mingbo Ma, Liang Huang, Bing Xiang, Bowen Zhou

Figure 1 for Group Sparse CNNs for Question Classification with Answer Sets

Figure 2 for Group Sparse CNNs for Question Classification with Answer Sets

Figure 3 for Group Sparse CNNs for Question Classification with Answer Sets

Figure 4 for Group Sparse CNNs for Question Classification with Answer Sets

Abstract:Question classification is an important task with wide applications. However, traditional techniques treat questions as general sentences, ignoring the corresponding answer data. In order to consider answer information into question modeling, we first introduce novel group sparse autoencoders which refine question representation by utilizing group information in the answer set. We then propose novel group sparse CNNs which naturally learn question representation with respect to their answers by implanting group sparse autoencoders into traditional CNNs. The proposed model significantly outperform strong baselines on four datasets.

* 6, ACL 2017

Via

Access Paper or Ask Questions

Jointly Trained Sequential Labeling and Classification by Sparse Attention Neural Networks

Sep 28, 2017

Mingbo Ma, Kai Zhao, Liang Huang, Bing Xiang, Bowen Zhou

Figure 1 for Jointly Trained Sequential Labeling and Classification by Sparse Attention Neural Networks

Figure 2 for Jointly Trained Sequential Labeling and Classification by Sparse Attention Neural Networks

Figure 3 for Jointly Trained Sequential Labeling and Classification by Sparse Attention Neural Networks

Figure 4 for Jointly Trained Sequential Labeling and Classification by Sparse Attention Neural Networks

Abstract:Sentence-level classification and sequential labeling are two fundamental tasks in language understanding. While these two tasks are usually modeled separately, in reality, they are often correlated, for example in intent classification and slot filling, or in topic classification and named-entity recognition. In order to utilize the potential benefits from their correlations, we propose a jointly trained model for learning the two tasks simultaneously via Long Short-Term Memory (LSTM) networks. This model predicts the sentence-level category and the word-level label sequence from the stepwise output hidden representations of LSTM. We also introduce a novel mechanism of "sparse attention" to weigh words differently based on their semantic relevance to sentence-level classification. The proposed method outperforms baseline models on ATIS and TREC datasets.

* interspeech 2017

Via

Access Paper or Ask Questions

Fast Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set

Aug 30, 2017

Tianze Shi, Liang Huang, Lillian Lee

Figure 1 for Fast Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set

Figure 2 for Fast Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set

Figure 3 for Fast Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set

Figure 4 for Fast Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set

Abstract:We first present a minimal feature set for transition-based dependency parsing, continuing a recent trend started by Kiperwasser and Goldberg (2016a) and Cross and Huang (2016a) of using bi-directional LSTM features. We plug our minimal feature set into the dynamic-programming framework of Huang and Sagae (2010) and Kuhlmann et al. (2011) to produce the first implementation of worst-case O(n^3) exact decoders for arc-hybrid and arc-eager transition systems. With our minimal features, we also present O(n^3) global training methods. Finally, using ensembles including our new parsers, we achieve the best unlabeled attachment score reported (to our knowledge) on the Chinese Treebank and the "second-best-in-class" result on the English Penn Treebank.

* Proceedings of EMNLP, 2017
* Proceedings of EMNLP, 2017. 12 pages

Via

Access Paper or Ask Questions