Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Code-switched inspired losses for generic spoken dialog representations

Sep 09, 2021

Emile Chapuis, Pierre Colombo, Matthieu Labeau, Chloe Clavel

Figure 1 for Code-switched inspired losses for generic spoken dialog representations

Figure 2 for Code-switched inspired losses for generic spoken dialog representations

Figure 3 for Code-switched inspired losses for generic spoken dialog representations

Figure 4 for Code-switched inspired losses for generic spoken dialog representations

Share this with someone who'll enjoy it:

Abstract:Spoken dialog systems need to be able to handle both multiple languages and multilinguality inside a conversation (\textit{e.g} in case of code-switching). In this work, we introduce new pretraining losses tailored to learn multilingual spoken dialog representations. The goal of these losses is to expose the model to code-switched language. To scale up training, we automatically build a pretraining corpus composed of multilingual conversations in five different languages (French, Italian, English, German and Spanish) from \texttt{OpenSubtitles}, a huge multilingual corpus composed of 24.3G tokens. We test the generic representations on \texttt{MIAM}, a new benchmark composed of five dialog act corpora on the same aforementioned languages as well as on two novel multilingual downstream tasks (\textit{i.e} multilingual mask utterance retrieval and multilingual inconsistency identification). Our experiments show that our new code switched-inspired losses achieve a better performance in both monolingual and multilingual settings.

* EMNLP 2021

View paper on

Share this with someone who'll enjoy it:

Title:Code-switched inspired losses for generic spoken dialog representations

Paper and Code