Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Restoring Hebrew Diacritics Without a Dictionary

May 11, 2021
Elazar Gershuni, Yuval Pinter

We demonstrate that it is feasible to diacritize Hebrew script without any human-curated resources other than plain diacritized text. We present NAKDIMON, a two-layer character level LSTM, that performs on par with much more complicated curation-dependent systems, across a diverse array of modern Hebrew sources.

* 6 pages, 1 figure 

  Access Paper or Ask Questions

Generación automática de frases literarias en español

Jan 17, 2020
Luis-Gil Moreno-Jim├ęnez, Juan-Manuel Torres-Moreno, Roseli S. Wedemann

In this work we present a state of the art in the area of Computational Creativity (CC). In particular, we address the automatic generation of literary sentences in Spanish. We propose three models of text generation based mainly on statistical algorithms and shallow parsing analysis. We also present some rather encouraging preliminary results.

* 13 pages, in Spanish, 6 figures, 3 tables 

  Access Paper or Ask Questions

Un systeme de lemmatisation pour les applications de TALN

Nov 16, 2019
Sadik Bessou, Mohamed Louail, Allaoua Refoufi, Zehour Kadem, Mohamed Touahria

This paper presents a method of stemming for the Arabian texts based on the linguistic techniques of the natural language processing. This method leans on the notion of scheme (one of the strong points of the morphology of the Arabian language). The advantage of this approach is that it doesn't use a dictionary of inflexions but a smart dynamic recognition of the different words of the language.

* in French 

  Access Paper or Ask Questions

Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop

Feb 14, 2018
Odette Scharenborg, Laurent Besacier, Alan Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stueker, Pierre Godard, Markus Mueller, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux

We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding the discovery of linguistic units (subwords and words) in a language without orthography. We study the replacement of orthographic transcriptions by images and/or translated text in a well-resourced language to help unsupervised discovery from raw speech.

* Accepted to ICASSP 2018 

  Access Paper or Ask Questions

The Character Thinks Ahead: creative writing with deep learning nets and its stylistic assessment

Dec 21, 2017
Roger T. Dean, Hazel Smith

We discuss how to control outputs from deep learning models of text corpora so as to create contemporary poetic works. We assess whether these controls are successful in the immediate sense of creating stylo- metric distinctiveness. The specific context is our piece The Character Thinks Ahead (2016/17); the potential applications are broad.

* A 2 page paper in press in Leonardo Vol 51, 2018. Yet to be copy-edited 

  Access Paper or Ask Questions

A Hackathon for Classical Tibetan

Sep 27, 2016
Orna Almogi, Lena Dankin, Nachum Dershowitz, Lior Wolf

We describe the course of a hackathon dedicated to the development of linguistic tools for Tibetan Buddhist studies. Over a period of five days, a group of seventeen scholars, scientists, and students developed and compared algorithms for intertextual alignment and text classification, along with some basic language tools, including a stemmer and word segmenter.

  Access Paper or Ask Questions

The GF Mathematics Library

Feb 22, 2012
Jordi Saludes, Sebastian Xamb├│

This paper is devoted to present the Mathematics Grammar Library, a system for multilingual mathematical text processing. We explain the context in which it originated, its current design and functionality and the current development goals. We also present two prototype services and comment on possible future applications in the area of artificial mathematics assistants.

* EPTCS 79, 2012, pp. 102-110 
* In Proceedings THedu'11, arXiv:1202.4535 

  Access Paper or Ask Questions

ICE-Talk: an Interface for a Controllable Expressive Talking Machine

Aug 25, 2020
No├ę Tits, Kevin El Haddad, Thierry Dutoit

ICE-Talk is an open source web-based GUI that allows the use of a TTS system with controllable parameters via a text field and a clickable 2D plot. It enables the study of latent spaces for controllable TTS. Moreover it is implemented as a module that can be used as part of a Human-Agent interaction.

  Access Paper or Ask Questions

A Novel Feature Selection and Extraction Technique for Classification

Dec 26, 2014
Kratarth Goel, Raunaq Vohra, Ainesh Bakshi

This paper presents a versatile technique for the purpose of feature selection and extraction - Class Dependent Features (CDFs). We use CDFs to improve the accuracy of classification and at the same time control computational expense by tackling the curse of dimensionality. In order to demonstrate the generality of this technique, it is applied to handwritten digit recognition and text categorization.

* IEEE Xplore, Proceedings of IEEE SMC 2014, pages 4033 - 4034 
* 2 pages, 2 tables, published at IEEE SMC 2014 

  Access Paper or Ask Questions