Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aditya Siddhant

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Mar 24, 2020
Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, Melvin Johnson

Figure 1 for XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Figure 2 for XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Figure 3 for XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Figure 4 for XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Much recent progress in applications of machine learning models to NLP has been driven by benchmarks that evaluate models across a wide variety of tasks. However, these broad-coverage benchmarks have been mostly limited to English, and despite an increasing interest in multilingual models, a benchmark that enables the comprehensive evaluation of such methods on a diverse range of languages and tasks is still missing. To this end, we introduce the Cross-lingual TRansfer Evaluation of Multilingual Encoders XTREME benchmark, a multi-task benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks. We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models, particularly on syntactic and sentence retrieval tasks. There is also a wide spread of results across languages. We release the benchmark to encourage research on cross-lingual learning methods that transfer linguistic knowledge across a diverse and representative set of languages and tasks.

Via

Access Paper or Ask Questions

Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

Sep 01, 2019
Aditya Siddhant, Melvin Johnson, Henry Tsai, Naveen Arivazhagan, Jason Riesa, Ankur Bapna, Orhan Firat, Karthik Raman

Figure 1 for Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

Figure 2 for Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

Figure 3 for Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

Figure 4 for Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

The recently proposed massively multilingual neural machine translation (NMT) system has been shown to be capable of translating over 100 languages to and from English within a single model. Its improved translation performance on low resource languages hints at potential cross-lingual transfer capability for downstream tasks. In this paper, we evaluate the cross-lingual effectiveness of representations from the encoder of a massively multilingual NMT model on 5 downstream classification and sequence labeling tasks covering a diverse set of over 50 languages. We compare against a strong baseline, multilingual BERT (mBERT), in different cross-lingual transfer learning scenarios and show gains in zero-shot transfer in 4 out of these 5 tasks.

Via

Access Paper or Ask Questions

Supervised Contextual Embeddings for Transfer Learning in Natural Language Processing Tasks

Jun 28, 2019
Mihir Kale, Aditya Siddhant, Sreyashi Nag, Radhika Parik, Matthias Grabmair, Anthony Tomasic

Figure 1 for Supervised Contextual Embeddings for Transfer Learning in Natural Language Processing Tasks

Figure 2 for Supervised Contextual Embeddings for Transfer Learning in Natural Language Processing Tasks

Figure 3 for Supervised Contextual Embeddings for Transfer Learning in Natural Language Processing Tasks

Pre-trained word embeddings are the primary method for transfer learning in several Natural Language Processing (NLP) tasks. Recent works have focused on using unsupervised techniques such as language modeling to obtain these embeddings. In contrast, this work focuses on extracting representations from multiple pre-trained supervised models, which enriches word embeddings with task and domain specific knowledge. Experiments performed in cross-task, cross-domain and cross-lingual settings indicate that such supervised embeddings are helpful, especially in the low-resource setting, but the extent of gains is dependent on the nature of the task and domain. We make our code publicly available.

* Appeared in 2nd Learning from Limited Labeled Data (LLD) Workshop at ICLR 2019

Via

Access Paper or Ask Questions

Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents

Nov 13, 2018
Aditya Siddhant, Anuj Goyal, Angeliki Metallinou

Figure 1 for Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents

Figure 2 for Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents

Figure 3 for Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents

Figure 4 for Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents

User interaction with voice-powered agents generates large amounts of unlabeled utterances. In this paper, we explore techniques to efficiently transfer the knowledge from these unlabeled utterances to improve model performance on Spoken Language Understanding (SLU) tasks. We use Embeddings from Language Model (ELMo) to take advantage of unlabeled data by learning contextualized word representations. Additionally, we propose ELMo-Light (ELMoL), a faster and simpler unsupervised pre-training method for SLU. Our findings suggest unsupervised pre-training on a large corpora of unlabeled utterances leads to significantly better SLU performance compared to training from scratch and it can even outperform conventional supervised transfer. Additionally, we show that the gains from unsupervised transfer techniques can be further improved by supervised transfer. The improvements are more pronounced in low resource settings and when using only 1000 labeled in-domain samples, our techniques match the performance of training from scratch on 10-15x more labeled in-domain data.

* To appear at AAAI 2019

Via

Access Paper or Ask Questions

Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study

Sep 24, 2018
Aditya Siddhant, Zachary C. Lipton

Figure 1 for Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study

Figure 2 for Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study

Figure 3 for Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study

Several recent papers investigate Active Learning (AL) for mitigating the data dependence of deep learning for natural language processing. However, the applicability of AL to real-world problems remains an open question. While in supervised learning, practitioners can try many different methods, evaluating each against a validation set before selecting a model, AL affords no such luxury. Over the course of one AL run, an agent annotates its dataset exhausting its labeling budget. Thus, given a new task, an active learner has no opportunity to compare models and acquisition functions. This paper provides a large scale empirical study of deep active learning, addressing multiple tasks and, for each, multiple datasets, multiple models, and a full suite of acquisition functions. We find that across all settings, Bayesian active learning by disagreement, using uncertainty estimates provided either by Dropout or Bayes-by Backprop significantly improves over i.i.d. baselines and usually outperforms classic uncertainty sampling.

* To be presented at EMNLP 2018

Via

Access Paper or Ask Questions

Leveraging Native Language Speech for Accent Identification using Deep Siamese Networks

Jun 18, 2018
Aditya Siddhant, Preethi Jyothi, Sriram Ganapathy

Figure 1 for Leveraging Native Language Speech for Accent Identification using Deep Siamese Networks

Figure 2 for Leveraging Native Language Speech for Accent Identification using Deep Siamese Networks

Figure 3 for Leveraging Native Language Speech for Accent Identification using Deep Siamese Networks

Figure 4 for Leveraging Native Language Speech for Accent Identification using Deep Siamese Networks

The problem of automatic accent identification is important for several applications like speaker profiling and recognition as well as for improving speech recognition systems. The accented nature of speech can be primarily attributed to the influence of the speaker's native language on the given speech recording. In this paper, we propose a novel accent identification system whose training exploits speech in native languages along with the accented speech. Specifically, we develop a deep Siamese network-based model which learns the association between accented speech recordings and the native language speech recordings. The Siamese networks are trained with i-vector features extracted from the speech recordings using either an unsupervised Gaussian mixture model (GMM) or a supervised deep neural network (DNN) model. We perform several accent identification experiments using the CSLU Foreign Accented English (FAE) corpus. In these experiments, our proposed approach using deep Siamese networks yield significant relative performance improvements of 15.4 percent on a 10-class accent identification task, over a baseline DNN-based classification system that uses GMM i-vectors. Furthermore, we present a detailed error analysis of the proposed accent identification system.

* Published in ASRU 2017

Via

Access Paper or Ask Questions