Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Bayesian multilingual topic model for zero-shot cross-lingual topic identification

Jul 02, 2020

Santosh Kesiraju, Sangeet Sagar, Ondřej Glembek, Lukáš Burget, Suryakanth V Gangashetty

Figure 1 for Bayesian multilingual topic model for zero-shot cross-lingual topic identification

Figure 2 for Bayesian multilingual topic model for zero-shot cross-lingual topic identification

Figure 3 for Bayesian multilingual topic model for zero-shot cross-lingual topic identification

Figure 4 for Bayesian multilingual topic model for zero-shot cross-lingual topic identification

Share this with someone who'll enjoy it:

Abstract:This paper presents a Bayesian multilingual topic model for learning language-independent document embeddings. Our model learns to represent the documents in the form of Gaussian distributions, thereby encoding the uncertainty in its covariance. We propagate the learned uncertainties through linear classifiers for zero-shot cross-lingual topic identification. Our experiments on 5 language Europarl and Reuters (MLDoc) corpora show that the proposed model outperforms multi-lingual word embedding and BiLSTM sentence encoder based systems with significant margins in the majority of the transfer directions. Moreover, our system trained under a single day on a single GPU with much lower amounts of data performs competitively as compared to the state-of-the-art universal BiLSTM sentence encoder trained on 93 languages. Our experimental analysis shows that the amount of parallel data improves the overall performance of embeddings. Nonetheless, exploiting the uncertainties is always beneficial.

* 10 pages, 5 figures

View paper on

Share this with someone who'll enjoy it:

Title:Bayesian multilingual topic model for zero-shot cross-lingual topic identification

Paper and Code