Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Changhan Wang

Jack

CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus

Feb 04, 2020

Changhan Wang, Juan Pino, Anne Wu, Jiatao Gu

Figure 1 for CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus

Figure 2 for CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus

Figure 3 for CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus

Figure 4 for CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus

Abstract:Spoken language translation has recently witnessed a resurgence in popularity, thanks to the development of end-to-end models and the creation of new corpora, such as Augmented LibriSpeech and MuST-C. Existing datasets involve language pairs with English as a source language, involve very specific domains or are low resource. We introduce CoVoST, a multilingual speech-to-text translation corpus from 11 languages into English, diversified with over 11,000 speakers and over 60 accents. We describe the dataset creation methodology and provide empirical evidence of the quality of the data. We also provide initial benchmarks, including, to our knowledge, the first end-to-end many-to-one multilingual models for spoken language translation. CoVoST is released under CC0 license and free to use. We also provide additional evaluation data derived from Tatoeba under CC licenses.

* Submitted to LREC 2020

Via

Access Paper or Ask Questions

VizSeq: A Visual Analysis Toolkit for Text Generation Tasks

Sep 12, 2019

Changhan Wang, Anirudh Jain, Danlu Chen, Jiatao Gu

Figure 1 for VizSeq: A Visual Analysis Toolkit for Text Generation Tasks

Figure 2 for VizSeq: A Visual Analysis Toolkit for Text Generation Tasks

Figure 3 for VizSeq: A Visual Analysis Toolkit for Text Generation Tasks

Figure 4 for VizSeq: A Visual Analysis Toolkit for Text Generation Tasks

Abstract:Automatic evaluation of text generation tasks (e.g. machine translation, text summarization, image captioning and video description) usually relies heavily on task-specific metrics, such as BLEU and ROUGE. They, however, are abstract numbers and are not perfectly aligned with human assessment. This suggests inspecting detailed examples as a complement to identify system error patterns. In this paper, we present VizSeq, a visual analysis toolkit for instance-level and corpus-level system evaluation on a wide variety of text generation tasks. It supports multimodal sources and multiple text references, providing visualization in Jupyter notebook or a web app interface. It can be used locally or deployed onto public servers for centralized data hosting and benchmarking. It covers most common n-gram based metrics accelerated with multiprocessing, and also provides latest embedding-based metrics such as BERTScore.

Via

Access Paper or Ask Questions

Neural Machine Translation with Byte-Level Subwords

Sep 07, 2019

Changhan Wang, Kyunghyun Cho, Jiatao Gu

Figure 1 for Neural Machine Translation with Byte-Level Subwords

Figure 2 for Neural Machine Translation with Byte-Level Subwords

Figure 3 for Neural Machine Translation with Byte-Level Subwords

Figure 4 for Neural Machine Translation with Byte-Level Subwords

Abstract:Almost all existing machine translation models are built on top of character-based vocabularies: characters, subwords or words. Rare characters from noisy text or character-rich languages such as Japanese and Chinese however can unnecessarily take up vocabulary slots and limit its compactness. Representing text at the level of bytes and using the 256 byte set as vocabulary is a potential solution to this issue. High computational cost has however prevented it from being widely deployed or used in practice. In this paper, we investigate byte-level subwords, specifically byte-level BPE (BBPE), which is compacter than character vocabulary and has no out-of-vocabulary tokens, but is more efficient than using pure bytes only is. We claim that contextualizing BBPE embeddings is necessary, which can be implemented by a convolutional or recurrent layer. Our experiments show that BBPE has comparable performance to BPE while its size is only 1/8 of that for BPE. In the multilingual setting, BBPE maximizes vocabulary sharing across many languages and achieves better translation quality. Moreover, we show that BBPE enables transferring models between languages with non-overlapping character sets.

Via

Access Paper or Ask Questions

Does Object Recognition Work for Everyone?

Jun 18, 2019

Terrance DeVries, Ishan Misra, Changhan Wang, Laurens van der Maaten

Figure 1 for Does Object Recognition Work for Everyone?

Figure 2 for Does Object Recognition Work for Everyone?

Figure 3 for Does Object Recognition Work for Everyone?

Figure 4 for Does Object Recognition Work for Everyone?

Abstract:The paper analyzes the accuracy of publicly available object-recognition systems on a geographically diverse dataset. This dataset contains household items and was designed to have a more representative geographical coverage than commonly used image datasets in object recognition. We find that the systems perform relatively poorly on household items that commonly occur in countries with a low household income. Qualitative analyses suggest the drop in performance is primarily due to appearance differences within an object class (e.g., dish soap) and due to items appearing in a different context (e.g., toothbrushes appearing outside of bathrooms). The results of our study suggest that further work is needed to make object-recognition systems work equally well for people across different countries and income levels.

Via

Access Paper or Ask Questions

Levenshtein Transformer

May 27, 2019

Jiatao Gu, Changhan Wang, Jake Zhao

Abstract:Modern neural sequence generation models are built to either generate tokens step-by-step from scratch or (iteratively) modify a sequence of tokens bounded by a fixed length. In this work, we develop Levenshtein Transformer, a new partially autoregressive model devised for more flexible and amenable sequence generation. Unlike previous approaches, the atomic operations of our model are insertion and deletion. The combination of them facilitates not only generation but also sequence refinement allowing dynamic length changes. We also propose a set of new training techniques dedicated at them, effectively exploiting one as the other's learning signal thanks to their complementary nature. Experiments applying the proposed model achieve comparable performance but much-improved efficiency on both generation (e.g. machine translation, text summarization) and refinement tasks (e.g. automatic post-editing). We further confirm the flexibility of our model by showing a Levenshtein Transformer trained by machine translation can straightforwardly be used for automatic post-editing.

* 16 pages (6 pages appendix). Work in progress

Via

Access Paper or Ask Questions

Dynamic Meta-Embeddings for Improved Sentence Representations

Sep 05, 2018

Douwe Kiela, Changhan Wang, Kyunghyun Cho

Figure 1 for Dynamic Meta-Embeddings for Improved Sentence Representations

Figure 2 for Dynamic Meta-Embeddings for Improved Sentence Representations

Figure 3 for Dynamic Meta-Embeddings for Improved Sentence Representations

Figure 4 for Dynamic Meta-Embeddings for Improved Sentence Representations

Abstract:While one of the first steps in many NLP systems is selecting what pre-trained word embeddings to use, we argue that such a step is better left for neural networks to figure out by themselves. To that end, we introduce dynamic meta-embeddings, a simple yet effective method for the supervised learning of embedding ensembles, which leads to state-of-the-art performance within the same model class on a variety of tasks. We subsequently show how the technique can be used to shed new light on the usage of word embeddings in NLP systems.

* EMNLP 2018

Via

Access Paper or Ask Questions