Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Benjamin Murauer

On the Influence of Machine Translation on Language Origin Obfuscation

Jun 24, 2021

Benjamin Murauer, Michael Tschuggnall, Günther Specht

Figure 1 for On the Influence of Machine Translation on Language Origin Obfuscation

Figure 2 for On the Influence of Machine Translation on Language Origin Obfuscation

Figure 3 for On the Influence of Machine Translation on Language Origin Obfuscation

Figure 4 for On the Influence of Machine Translation on Language Origin Obfuscation

Abstract:In the last decade, machine translation has become a popular means to deal with multilingual digital content. By providing higher quality translations, obfuscating the source language of a text becomes more attractive. In this paper, we analyze the ability to detect the source language from the translated output of two widely used commercial machine translation systems by utilizing machine-learning algorithms with basic textual features like n-grams. Evaluations show that the source language can be reconstructed with high accuracy for documents that contain a sufficient amount of translated text. In addition, we analyze how the document size influences the performance of the prediction, as well as how limiting the set of possible source languages improves the classification accuracy.

* This was peer-reviewed, accepted and presented at https://www.cicling.org/2018/, but the organizer somehow failed to publish the proceedings

Via

Access Paper or Ask Questions

DT-grams: Structured Dependency Grammar Stylometry for Cross-Language Authorship Attribution

Jun 10, 2021

Benjamin Murauer, Günther Specht

Figure 1 for DT-grams: Structured Dependency Grammar Stylometry for Cross-Language Authorship Attribution

Figure 2 for DT-grams: Structured Dependency Grammar Stylometry for Cross-Language Authorship Attribution

Figure 3 for DT-grams: Structured Dependency Grammar Stylometry for Cross-Language Authorship Attribution

Figure 4 for DT-grams: Structured Dependency Grammar Stylometry for Cross-Language Authorship Attribution

Abstract:Cross-language authorship attribution problems rely on either translation to enable the use of single-language features, or language-independent feature extraction methods. Until recently, the lack of datasets for this problem hindered the development of the latter, and single-language solutions were performed on machine-translated corpora. In this paper, we present a novel language-independent feature for authorship analysis based on dependency graphs and universal part of speech tags, called DT-grams (dependency tree grams), which are constructed by selecting specific sub-parts of the dependency graph of sentences. We evaluate DT-grams by performing cross-language authorship attribution on untranslated datasets of bilingual authors, showing that, on average, they achieve a macro-averaged F1 score of 0.081 higher than previous methods across five different language pairs. Additionally, by providing results for a diverse set of features for comparison, we provide a baseline on the previously undocumented task of untranslated cross-language authorship attribution.

* To be published in: "32. GI-Workshop Grundlagen von Datenbanken"

Via

Access Paper or Ask Questions