Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arthur Malajyan

A Simple and Effective Method of Cross-Lingual Plagiarism Detection

Apr 05, 2023

Karen Avetisyan, Arthur Malajyan, Tsolak Ghukasyan, Arutyun Avetisyan

Abstract:We present a simple cross-lingual plagiarism detection method applicable to a large number of languages. The presented approach leverages open multilingual thesauri for candidate retrieval task and pre-trained multilingual BERT-based language models for detailed analysis. The method does not rely on machine translation and word sense disambiguation when in use, and therefore is suitable for a large number of languages, including under-resourced languages. The effectiveness of the proposed approach is demonstrated for several existing and new benchmarks, achieving state-of-the-art results for French, Russian, and Armenian languages.

Via

Access Paper or Ask Questions

ARPA: Armenian Paraphrase Detection Corpus and Models

Sep 26, 2020

Arthur Malajyan, Karen Avetisyan, Tsolak Ghukasyan

Figure 1 for ARPA: Armenian Paraphrase Detection Corpus and Models

Figure 2 for ARPA: Armenian Paraphrase Detection Corpus and Models

Figure 3 for ARPA: Armenian Paraphrase Detection Corpus and Models

Figure 4 for ARPA: Armenian Paraphrase Detection Corpus and Models

Abstract:In this work, we employ a semi-automatic method based on back translation to generate a sentential paraphrase corpus for the Armenian language. The initial collection of sentences is translated from Armenian to English and back twice, resulting in pairs of lexically distant but semantically similar sentences. The generated paraphrases are then manually reviewed and annotated. Using the method train and test datasets are created, containing 2360 paraphrases in total. In addition, the datasets are used to train and evaluate BERTbased models for detecting paraphrase in Armenian, achieving results comparable to the state-of-the-art of other languages.

* To be published in the proceedings of Ivannikov Memorial Workshop 2020

Via

Access Paper or Ask Questions