Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

WVOQ at SemEval-2021 Task 6: BART for Span Detection and Classification

Jun 27, 2021
Cees Roele

A novel solution to span detection and classification is presented in which a BART EncoderDecoder model is used to transform textual input into a version with XML-like marked up spans. This markup is subsequently translated to an identification of the beginning and end of fragments and of their classes. Discussed is how pre-training methodology both explains the relative success of this method and its limitations. This paper reports on participation in task 6 of SemEval-2021: Detection of Persuasion Techniques in Texts and Images.

* SemEval-2021 
* 5 pages, 1 figure, accepted at SemEval-2021 co-located with ACL-IJCNLP 2021 

  Access Paper or Ask Questions

Sprachsynthese -- State-of-the-Art in englischer und deutscher Sprache

Jun 11, 2021
René Peinl

Reading text aloud is an important feature for modern computer applications. It not only facilitates access to information for visually impaired people, but is also a pleasant convenience for non-impaired users. In this article, the state of the art of speech synthesis is presented separately for mel-spectrogram generation and vocoders. It concludes with an overview of available data sets for English and German with a discussion of the transferability of the good speech synthesis results from English to German language.

* in German 

  Access Paper or Ask Questions

Neural document expansion for ad-hoc information retrieval

Dec 27, 2020
Cheng Tang, Andrew Arnold

Recently, Nogueira et al. [2019] proposed a new approach to document expansion based on a neural Seq2Seq model, showing significant improvement on short text retrieval task. However, this approach needs a large amount of in-domain training data. In this paper, we show that this neural document expansion approach can be effectively adapted to standard IR tasks, where labels are scarce and many long documents are present.

  Access Paper or Ask Questions

Meta-Learning for Natural Language Understanding under Continual Learning Framework

Nov 03, 2020
Jiacheng Wang, Yong Fan, Duo Jiang, Shiqing Li

Neural network has been recognized with its accomplishments on tackling various natural language understanding (NLU) tasks. Methods have been developed to train a robust model to handle multiple tasks to gain a general representation of text. In this paper, we implement the model-agnostic meta-learning (MAML) and Online aware Meta-learning (OML) meta-objective under the continual framework for NLU tasks. We validate our methods on selected SuperGLUE and GLUE benchmark.

  Access Paper or Ask Questions

Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview

Oct 14, 2020
Alena Butryna, Shan-Hui Cathy Chu, Isin Demirsahin, Alexander Gutkin, Linne Ha, Fei He, Martin Jansche, Cibu Johny, Anna Katanova, Oddur Kjartansson, Chenfang Li, Tatiana Merkulova, Yin May Oo, Knot Pipatsrisawat, Clara Rivera, Supheakmungkol Sarin, Pasindu de Silva, Keshan Sodimana, Richard Sproat, Theeraphol Wattanavekin, Jaka Aris Eko Wibawa

This paper presents an overview of a program designed to address the growing need for developing freely available speech resources for under-represented languages. At present we have released 38 datasets for building text-to-speech and automatic speech recognition applications for languages and dialects of South and Southeast Asia, Africa, Europe and South America. The paper describes the methodology used for developing such corpora and presents some of our findings that could benefit under-represented language communities.

* Appeared in 2019 UNESCO International Conference Language Technologies for All (LT4All): Enabling Linguistic Diversity and Multilingualism Worldwide, 4-6 December, Paris, France 

  Access Paper or Ask Questions

SciBERT-based Semantification of Bioassays in the Open Research Knowledge Graph

Sep 16, 2020
Marco Anteghini, Jennifer D'Souza, Vitor A. P. Martins dos Santos, Sören Auer

As a novel contribution to the problem of semantifying biological assays, in this paper, we propose a neural-network-based approach to automatically semantify, thereby structure, unstructured bioassay text descriptions. Experimental evaluations, to this end, show promise as the neural-based semantification significantly outperforms a naive frequency-based baseline approach. Specifically, the neural method attains 72% F1 versus 47% F1 from the frequency-based method.

* In proceedings of the '22nd International Conference on Knowledge Engineering and Knowledge Management' 'Demo and Poster section' 

  Access Paper or Ask Questions

Investigating the Effect of Emoji in Opinion Classification of Uzbek Movie Review Comments

Aug 02, 2020
Ilyos Rabbimov, Iosif Mporas, Vasiliki Simaki, Sami Kobilov

Opinion mining on social media posts has become more and more popular. Users often express their opinion on a topic not only with words but they also use image symbols such as emoticons and emoji. In this paper, we investigate the effect of emoji-based features in opinion classification of Uzbek texts, and more specifically movie review comments from YouTube. Several classification algorithms are tested, and feature ranking is performed to evaluate the discriminative ability of the emoji-based features.

* 10 pages, 1 figure, 3 tables 

  Access Paper or Ask Questions

On-The-Fly Information Retrieval Augmentation for Language Models

Jul 03, 2020
Hai Wang, David McAllester

Here we experiment with the use of information retrieval as an augmentation for pre-trained language models. The text corpus used in information retrieval can be viewed as form of episodic memory which grows over time. By augmenting GPT 2.0 with information retrieval we achieve a zero shot 15% relative reduction in perplexity on Gigaword corpus without any re-training. We also validate our IR augmentation on an event co-reference task.

* ACL 2020 NUSE Workshop 

  Access Paper or Ask Questions

Semantic Noise Matters for Neural Natural Language Generation

Nov 10, 2019
Ondřej Dušek, David M. Howcroft, Verena Rieser

Neural natural language generation (NNLG) systems are known for their pathological outputs, i.e. generating text which is unrelated to the input specification. In this paper, we show the impact of semantic noise on state-of-the-art NNLG models which implement different semantic control mechanisms. We find that cleaned data can improve semantic correctness by up to 97%, while maintaining fluency. We also find that the most common error is omitting information, rather than hallucination.

* In Proceedings of INLG 2019, Tokyo, Japan 

  Access Paper or Ask Questions

ICDM 2019 Knowledge Graph Contest: Team UWA

Sep 04, 2019
Michael Stewart, Majigsuren Enkhsaikhan, Wei Liu

We present an overview of our triple extraction system for the ICDM 2019 Knowledge Graph Contest. Our system uses a pipeline-based approach to extract a set of triples from a given document. It offers a simple and effective solution to the challenge of knowledge graph construction from domain-specific text. It also provides the facility to visualise useful information about each triple such as the degree, betweenness, structured relation type(s), and named entity types.

  Access Paper or Ask Questions