Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ekaterina Artemova

Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with Controllable Perturbations

Sep 28, 2021

Ekaterina Taktasheva, Vladislav Mikhailov, Ekaterina Artemova

Figure 1 for Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with Controllable Perturbations

Figure 2 for Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with Controllable Perturbations

Figure 3 for Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with Controllable Perturbations

Figure 4 for Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with Controllable Perturbations

Abstract:Recent research has adopted a new experimental field centered around the concept of text perturbations which has revealed that shuffled word order has little to no impact on the downstream performance of Transformer-based language models across many NLP tasks. These findings contradict the common understanding of how the models encode hierarchical and structural information and even question if the word order is modeled with position embeddings. To this end, this paper proposes nine probing datasets organized by the type of \emph{controllable} text perturbation for three Indo-European languages with a varying degree of word order flexibility: English, Swedish and Russian. Based on the probing analysis of the M-BERT and M-BART models, we report that the syntactic sensitivity depends on the language and model pre-training objectives. We also find that the sensitivity grows across layers together with the increase of the perturbation granularity. Last but not least, we show that the models barely use the positional information to induce syntactic trees from their intermediate self-attention and contextualized representations.

* accepted to MRL @ EMNLP 2021

Via

Access Paper or Ask Questions

Artificial Text Detection via Examining the Topology of Attention Maps

Sep 10, 2021

Laida Kushnareva, Daniil Cherniavskii, Vladislav Mikhailov, Ekaterina Artemova, Serguei Barannikov, Alexander Bernstein, Irina Piontkovskaya, Dmitri Piontkovski, Evgeny Burnaev

Figure 1 for Artificial Text Detection via Examining the Topology of Attention Maps

Figure 2 for Artificial Text Detection via Examining the Topology of Attention Maps

Figure 3 for Artificial Text Detection via Examining the Topology of Attention Maps

Figure 4 for Artificial Text Detection via Examining the Topology of Attention Maps

Abstract:The impressive capabilities of recent generative models to create texts that are challenging to distinguish from the human-written ones can be misused for generating fake news, product reviews, and even abusive content. Despite the prominent performance of existing methods for artificial text detection, they still lack interpretability and robustness towards unseen models. To this end, we propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA) which is currently understudied in the field of NLP. We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10\% on three common datasets, and tend to be the most robust towards unseen GPT-style generation models as opposed to existing methods. The probing analysis of the features reveals their sensitivity to the surface and syntactic properties. The results demonstrate that TDA is a promising line with respect to NLP tasks, specifically the ones that incorporate surface and structural information.

* Accepted to EMNLP 2021

Via

Access Paper or Ask Questions

NEREL: A Russian Dataset with Nested Named Entities, Relations and Events

Sep 03, 2021

Natalia Loukachevitch, Ekaterina Artemova, Tatiana Batura, Pavel Braslavski, Ilia Denisov, Vladimir Ivanov, Suresh Manandhar, Alexander Pugachev, Elena Tutubalina

Figure 1 for NEREL: A Russian Dataset with Nested Named Entities, Relations and Events

Figure 2 for NEREL: A Russian Dataset with Nested Named Entities, Relations and Events

Figure 3 for NEREL: A Russian Dataset with Nested Named Entities, Relations and Events

Figure 4 for NEREL: A Russian Dataset with Nested Named Entities, Relations and Events

Abstract:In this paper, we present NEREL, a Russian dataset for named entity recognition and relation extraction. NEREL is significantly larger than existing Russian datasets: to date it contains 56K annotated named entities and 39K annotated relations. Its important difference from previous datasets is annotation of nested named entities, as well as relations within nested entities and at the discourse level. NEREL can facilitate development of novel models that can extract relations between nested named entities, as well as relations on both sentence and document levels. NEREL also contains the annotation of events involving named entities and their roles in the events. The NEREL collection is available via https://github.com/nerel-ds/NEREL.

* accepted to RANLP

Via

Access Paper or Ask Questions

A Single Example Can Improve Zero-Shot Data Generation

Aug 16, 2021

Pavel Burnyshev, Valentin Malykh, Andrey Bout, Ekaterina Artemova, Irina Piontkovskaya

Figure 1 for A Single Example Can Improve Zero-Shot Data Generation

Figure 2 for A Single Example Can Improve Zero-Shot Data Generation

Figure 3 for A Single Example Can Improve Zero-Shot Data Generation

Figure 4 for A Single Example Can Improve Zero-Shot Data Generation

Abstract:Sub-tasks of intent classification, such as robustness to distribution shift, adaptation to specific user groups and personalization, out-of-domain detection, require extensive and flexible datasets for experiments and evaluation. As collecting such datasets is time- and labor-consuming, we propose to use text generation methods to gather datasets. The generator should be trained to generate utterances that belong to the given intent. We explore two approaches to generating task-oriented utterances. In the zero-shot approach, the model is trained to generate utterances from seen intents and is further used to generate utterances for intents unseen during training. In the one-shot approach, the model is presented with a single utterance from a test intent. We perform a thorough automatic, and human evaluation of the dataset generated utilizing two proposed approaches. Our results reveal that the attributes of the generated data are close to original test sets, collected via crowd-sourcing.

* To appear in INLG2021 proceedings

Via

Access Paper or Ask Questions

A Differentiable Language Model Adversarial Attack on Text Classifiers

Jul 23, 2021

Ivan Fursov, Alexey Zaytsev, Pavel Burnyshev, Ekaterina Dmitrieva, Nikita Klyuchnikov, Andrey Kravchenko, Ekaterina Artemova, Evgeny Burnaev

Figure 1 for A Differentiable Language Model Adversarial Attack on Text Classifiers

Figure 2 for A Differentiable Language Model Adversarial Attack on Text Classifiers

Figure 3 for A Differentiable Language Model Adversarial Attack on Text Classifiers

Figure 4 for A Differentiable Language Model Adversarial Attack on Text Classifiers

Abstract:Robustness of huge Transformer-based models for natural language processing is an important issue due to their capabilities and wide adoption. One way to understand and improve robustness of these models is an exploration of an adversarial attack scenario: check if a small perturbation of an input can fool a model. Due to the discrete nature of textual data, gradient-based adversarial methods, widely used in computer vision, are not applicable per~se. The standard strategy to overcome this issue is to develop token-level transformations, which do not take the whole sentence into account. In this paper, we propose a new black-box sentence-level attack. Our method fine-tunes a pre-trained language model to generate adversarial examples. A proposed differentiable loss function depends on a substitute classifier score and an approximate edit distance computed via a deep learning model. We show that the proposed attack outperforms competitors on a diverse set of NLP problems for both computed metrics and human evaluation. Moreover, due to the usage of the fine-tuned language model, the generated adversarial examples are hard to detect, thus current models are not robust. Hence, it is difficult to defend from the proposed attack, which is not the case for other attacks.

* arXiv admin note: substantial text overlap with arXiv:2006.11078

Via

Access Paper or Ask Questions

Teaching a Massive Open Online Course on Natural Language Processing

May 04, 2021

Ekaterina Artemova, Murat Apishev, Veronika Sarkisyan, Sergey Aksenov, Denis Kirjanov, Oleg Serikov

Figure 1 for Teaching a Massive Open Online Course on Natural Language Processing

Figure 2 for Teaching a Massive Open Online Course on Natural Language Processing

Figure 3 for Teaching a Massive Open Online Course on Natural Language Processing

Figure 4 for Teaching a Massive Open Online Course on Natural Language Processing

Abstract:This paper presents a new Massive Open Online Course on Natural Language Processing, targeted at non-English speaking students. The course lasts 12 weeks; every week consists of lectures, practical sessions, and quiz assignments. Three weeks out of 12 are followed by Kaggle-style coding assignments. Our course intends to serve multiple purposes: (i) familiarize students with the core concepts and methods in NLP, such as language modeling or word or sentence representations, (ii) show that recent advances, including pre-trained Transformer-based models, are built upon these concepts; (iii) introduce architectures for most demanded real-life applications, (iv) develop practical skills to process texts in multiple languages. The course was prepared and recorded during 2020, launched by the end of the year, and in early 2021 has received positive feedback.

* To appear in the Proceedings of the Fifth Workshop on Teaching NLP @ NAACL

Via

Access Paper or Ask Questions

Morph Call: Probing Morphosyntactic Content of Multilingual Transformers

May 04, 2021

Vladislav Mikhailov, Oleg Serikov, Ekaterina Artemova

Figure 1 for Morph Call: Probing Morphosyntactic Content of Multilingual Transformers

Figure 2 for Morph Call: Probing Morphosyntactic Content of Multilingual Transformers

Figure 3 for Morph Call: Probing Morphosyntactic Content of Multilingual Transformers

Figure 4 for Morph Call: Probing Morphosyntactic Content of Multilingual Transformers

Abstract:The outstanding performance of transformer-based language models on a great variety of NLP and NLU tasks has stimulated interest in exploring their inner workings. Recent research has focused primarily on higher-level and complex linguistic phenomena such as syntax, semantics, world knowledge, and common sense. The majority of the studies are anglocentric, and little remains known regarding other languages, precisely their morphosyntactic properties. To this end, our work presents Morph Call, a suite of 46 probing tasks for four Indo-European languages of different morphology: English, French, German and Russian. We propose a new type of probing task based on the detection of guided sentence perturbations. We use a combination of neuron-, layer- and representation-level introspection techniques to analyze the morphosyntactic content of four multilingual transformers, including their less explored distilled versions. Besides, we examine how fine-tuning for POS-tagging affects the model knowledge. The results show that fine-tuning can improve and decrease the probing performance and change how morphosyntactic knowledge is distributed across the model. The code and data are publicly available, and we hope to fill the gaps in the less studied aspect of transformers.

* To appear in the Proceedings of the 3rd Workshop on Research in Computational Typology and Multilingual NLP (SIGTYP, NAACL)

Via

Access Paper or Ask Questions

MOROCCO: Model Resource Comparison Framework

Apr 29, 2021

Valentin Malykh, Alexander Kukushkin, Ekaterina Artemova, Vladislav Mikhailov, Maria Tikhonova, Tatiana Shavrina

Figure 1 for MOROCCO: Model Resource Comparison Framework

Figure 2 for MOROCCO: Model Resource Comparison Framework

Figure 3 for MOROCCO: Model Resource Comparison Framework

Figure 4 for MOROCCO: Model Resource Comparison Framework

Abstract:The new generation of pre-trained NLP models push the SOTA to the new limits, but at the cost of computational resources, to the point that their use in real production environments is often prohibitively expensive. We tackle this problem by evaluating not only the standard quality metrics on downstream tasks but also the memory footprint and inference time. We present MOROCCO, a framework to compare language models compatible with \texttt{jiant} environment which supports over 50 NLU tasks, including SuperGLUE benchmark and multiple probing suites. We demonstrate its applicability for two GLUE-like suites in different languages.

Via

Access Paper or Ask Questions

RuSentEval: Linguistic Source, Encoder Force!

Mar 02, 2021

Vladislav Mikhailov, Ekaterina Taktasheva, Elina Sigdel, Ekaterina Artemova

Figure 1 for RuSentEval: Linguistic Source, Encoder Force!

Figure 2 for RuSentEval: Linguistic Source, Encoder Force!

Figure 3 for RuSentEval: Linguistic Source, Encoder Force!

Figure 4 for RuSentEval: Linguistic Source, Encoder Force!

Abstract:The success of pre-trained transformer language models has brought a great deal of interest on how these models work, and what they learn about language. However, prior research in the field is mainly devoted to English, and little is known regarding other languages. To this end, we introduce RuSentEval, an enhanced set of 14 probing tasks for Russian, including ones that have not been explored yet. We apply a combination of complementary probing methods to explore the distribution of various linguistic properties in five multilingual transformers for two typologically contrasting languages -- Russian and English. Our results provide intriguing findings that contradict the common understanding of how linguistic knowledge is represented, and demonstrate that some properties are learned in a similar manner despite the language differences.

* The paper is accepted to BSNLP workshop at EACL 2021. The title follows Power Rangers Mystic Force series (Roll Call Team-Morph: "Magical Source, Mystic Force!")

Via

Access Paper or Ask Questions

Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates

Feb 18, 2021

Artem Shelmanov, Dmitri Puzyrev, Lyubov Kupriyanova, Denis Belyakov, Daniil Larionov, Nikita Khromov, Olga Kozlova, Ekaterina Artemova, Dmitry V. Dylov, Alexander Panchenko

Figure 1 for Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates

Figure 2 for Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates

Figure 3 for Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates

Figure 4 for Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates

Abstract:Annotating training data for sequence tagging of texts is usually very time-consuming. Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget. We are the first to thoroughly investigate this powerful combination for the sequence tagging task. We conduct an extensive empirical study of various Bayesian uncertainty estimation methods and Monte Carlo dropout options for deep pre-trained models in the active learning framework and find the best combinations for different types of models. Besides, we also demonstrate that to acquire instances during active learning, a full-size Transformer can be substituted with a distilled version, which yields better computational performance and reduces obstacles for applying deep active learning in practice.

* In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2021)

Via

Access Paper or Ask Questions