Alert button
Picture for Daniil Moskovskiy

Daniil Moskovskiy

Alert button

Exploring Cross-lingual Textual Style Transfer with Large Multilingual Language Models

Jun 05, 2022
Daniil Moskovskiy, Daryna Dementieva, Alexander Panchenko

Figure 1 for Exploring Cross-lingual Textual Style Transfer with Large Multilingual Language Models
Figure 2 for Exploring Cross-lingual Textual Style Transfer with Large Multilingual Language Models
Figure 3 for Exploring Cross-lingual Textual Style Transfer with Large Multilingual Language Models
Figure 4 for Exploring Cross-lingual Textual Style Transfer with Large Multilingual Language Models

Detoxification is a task of generating text in polite style while preserving meaning and fluency of the original toxic text. Existing detoxification methods are designed to work in one exact language. This work investigates multilingual and cross-lingual detoxification and the behavior of large multilingual models like in this setting. Unlike previous works we aim to make large language models able to perform detoxification without direct fine-tuning in given language. Experiments show that multilingual models are capable of performing multilingual style transfer. However, models are not able to perform cross-lingual detoxification and direct fine-tuning on exact language is inevitable.

Viaarxiv icon

Methods for Detoxification of Texts for the Russian Language

May 19, 2021
Daryna Dementieva, Daniil Moskovskiy, Varvara Logacheva, David Dale, Olga Kozlova, Nikita Semenov, Alexander Panchenko

Figure 1 for Methods for Detoxification of Texts for the Russian Language
Figure 2 for Methods for Detoxification of Texts for the Russian Language
Figure 3 for Methods for Detoxification of Texts for the Russian Language
Figure 4 for Methods for Detoxification of Texts for the Russian Language

We introduce the first study of automatic detoxification of Russian texts to combat offensive language. Such a kind of textual style transfer can be used, for instance, for processing toxic content in social media. While much work has been done for the English language in this field, it has never been solved for the Russian language yet. We test two types of models - unsupervised approach based on BERT architecture that performs local corrections and supervised approach based on pretrained language GPT-2 model - and compare them with several baselines. In addition, we describe evaluation setup providing training datasets and metrics for automatic evaluation. The results show that the tested approaches can be successfully used for detoxification, although there is room for improvement.

Viaarxiv icon