Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Apr 10, 2020

Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, Melvin Johnson

Figure 1 for XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Figure 2 for XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Figure 3 for XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Figure 4 for XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Share this with someone who'll enjoy it:

Abstract:Much recent progress in applications of machine learning models to NLP has been driven by benchmarks that evaluate models across a wide variety of tasks. However, these broad-coverage benchmarks have been mostly limited to English, and despite an increasing interest in multilingual models, a benchmark that enables the comprehensive evaluation of such methods on a diverse range of languages and tasks is still missing. To this end, we introduce the Cross-lingual TRansfer Evaluation of Multilingual Encoders XTREME benchmark, a multi-task benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks. We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models, particularly on syntactic and sentence retrieval tasks. There is also a wide spread of results across languages. We release the benchmark to encourage research on cross-lingual learning methods that transfer linguistic knowledge across a diverse and representative set of languages and tasks.

View paper on

Share this with someone who'll enjoy it:

Title:XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Paper and Code