Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Applying Phonological Features in Multilingual Text-To-Speech

Oct 10, 2021
Cong Zhang, Huinan Zeng, Huang Liu, Jiewen Zheng

This study investigates whether phonological features can be applied in text-to-speech systems to generate native and non-native speech in English and Mandarin. We present a mapping of ARPABET/pinyin to SAMPA/SAMPA-SC and then to phonological features. We tested whether this mapping could lead to the successful generation of native, non-native, and code-switched speech in the two languages. We ran two experiments, one with a small dataset and one with a larger dataset. The results proved that phonological features could be used as a feasible input system, although further investigation is needed to improve model performance. The accented output generated by the TTS models also helps with understanding human second language acquisition processes.

* demo webpage: https://congzhang365.github.io/feature_tts/ 

  Access Paper or Ask Questions

Evaluating Transformer-Based Multilingual Text Classification

Apr 30, 2020
Sophie Groenwold, Samhita Honnavalli, Lily Ou, Aesha Parekh, Sharon Levy, Diba Mirza, William Yang Wang

As NLP tools become ubiquitous in today's technological landscape, they are increasingly applied to languages with a variety of typological structures. However, NLP research does not focus primarily on typological differences in its analysis of state-of-the-art language models. As a result, NLP tools perform unequally across languages with different syntactic and morphological structures. Through a detailed discussion of word order typology, morphological typology, and comparative linguistics, we identify which variables most affect language modeling efficacy; in addition, we calculate word order and morphological similarity indices to aid our empirical study. We then use this background to support our analysis of an experiment we conduct using multi-class text classification on eight languages and eight models.

* Total of 15 pages (9 pages for paper, 2 pages for references, 4 pages for appendix). Changed title 

  Access Paper or Ask Questions

Improving text classification with vectors of reduced precision

Jun 20, 2017
Krzysztof Wróbel, Maciej Wielgosz, Marcin Pietroń, Michał Karwatowski, Aleksander Smywiński-Pohl

This paper presents the analysis of the impact of a floating-point number precision reduction on the quality of text classification. The precision reduction of the vectors representing the data (e.g. TF-IDF representation in our case) allows for a decrease of computing time and memory footprint on dedicated hardware platforms. The impact of precision reduction on the classification quality was performed on 5 corpora, using 4 different classifiers. Also, dimensionality reduction was taken into account. Results indicate that the precision reduction improves classification accuracy for most cases (up to 25% of error reduction). In general, the reduction from 64 to 4 bits gives the best scores and ensures that the results will not be worse than with the full floating-point representation.


  Access Paper or Ask Questions

Parsing of part-of-speech tagged Assamese Texts

Dec 09, 2009
Mirzanur Rahman, Sufal Das, Utpal Sharma

A natural language (or ordinary language) is a language that is spoken, written, or signed by humans for general-purpose communication, as distinguished from formal languages (such as computer-programming languages or the "languages" used in the study of formal logic). The computational activities required for enabling a computer to carry out information processing using natural language is called natural language processing. We have taken Assamese language to check the grammars of the input sentence. Our aim is to produce a technique to check the grammatical structures of the sentences in Assamese text. We have made grammar rules by analyzing the structures of Assamese sentences. Our parsing program finds the grammatical errors, if any, in the Assamese sentence. If there is no error, the program will generate the parse tree for the Assamese sentence

* M. Rahman, S. Das and U. Sharma, "Parsing of part-of-speech tagged Assamese Texts", International Journal of Computer Science Issues, IJCSI, Volume 6, Issue 1, pp28-34, November 2009 
* International Journal of Computer Science Issues, IJCSI Volume 6, Issue 1, pp28-34, November 2009 

  Access Paper or Ask Questions

Integrating question answering and text-to-SQL in Portuguese

Feb 08, 2022
Marcos Menon José, Marcelo Archanjo José, Denis Deratani Mauá, Fábio Gagliardi Cozman

Deep learning transformers have drastically improved systems that automatically answer questions in natural language. However, different questions demand different answering techniques; here we propose, build and validate an architecture that integrates different modules to answer two distinct kinds of queries. Our architecture takes a free-form natural language text and classifies it to send it either to a Neural Question Answering Reasoner or a Natural Language parser to SQL. We implemented a complete system for the Portuguese language, using some of the main tools available for the language and translating training and testing datasets. Experiments show that our system selects the appropriate answering method with high accuracy (over 99\%), thus validating a modular question answering strategy.

* Accepted at International Conference on the Computational Processing of Portuguese (PROPOR 2022), but not yet published 

  Access Paper or Ask Questions

Multi-speaker Emotional Text-to-speech Synthesizer

Dec 07, 2021
Sungjae Cho, Soo-Young Lee

We present a methodology to train our multi-speaker emotional text-to-speech synthesizer that can express speech for 10 speakers' 7 different emotions. All silences from audio samples are removed prior to learning. This results in fast learning by our model. Curriculum learning is applied to train our model efficiently. Our model is first trained with a large single-speaker neutral dataset, and then trained with neutral speech from all speakers. Finally, our model is trained using datasets of emotional speech from all speakers. In each stage, training samples of each speaker-emotion pair have equal probability to appear in mini-batches. Through this procedure, our model can synthesize speech for all targeted speakers and emotions. Our synthesized audio sets are available on our web page.

* Proceedings of Interspeech 2021 
* 2 pages; Published in the Proceedings of Interspeech 2021; Presented in Show and Tell; For the published paper, see https://www.isca-speech.org/archive/interspeech_2021/cho21_interspeech.html 

  Access Paper or Ask Questions

A Framework for Neural Topic Modeling of Text Corpora

Aug 19, 2021
Shayan Fazeli, Majid Sarrafzadeh

Topic Modeling refers to the problem of discovering the main topics that have occurred in corpora of textual data, with solutions finding crucial applications in numerous fields. In this work, inspired by the recent advancements in the Natural Language Processing domain, we introduce FAME, an open-source framework enabling an efficient mechanism of extracting and incorporating textual features and utilizing them in discovering topics and clustering text documents that are semantically similar in a corpus. These features range from traditional approaches (e.g., frequency-based) to the most recent auto-encoding embeddings from transformer-based language models such as BERT model family. To demonstrate the effectiveness of this library, we conducted experiments on the well-known News-Group dataset. The library is available online.


  Access Paper or Ask Questions

Saliency Maps Generation for Automatic Text Summarization

Jul 12, 2019
David Tuckey, Krysia Broda, Alessandra Russo

Saliency map generation techniques are at the forefront of explainable AI literature for a broad range of machine learning applications. Our goal is to question the limits of these approaches on more complex tasks. In this paper we apply Layer-Wise Relevance Propagation (LRP) to a sequence-to-sequence attention model trained on a text summarization dataset. We obtain unexpected saliency maps and discuss the rightfulness of these "explanations". We argue that we need a quantitative way of testing the counterfactual case to judge the truthfulness of the saliency maps. We suggest a protocol to check the validity of the importance attributed to the input and show that the saliency maps obtained sometimes capture the real use of the input features by the network, and sometimes do not. We use this example to discuss how careful we need to be when accepting them as explanation.


  Access Paper or Ask Questions

Data-to-Text Generation with Content Selection and Planning

Sep 03, 2018
Ratish Puduppully, Li Dong, Mirella Lapata

Recent advances in data-to-text generation have led to the use of large-scale datasets and neural network models which are trained end-to-end, without explicitly modeling what to say and in what order. In this work, we present a neural network architecture which incorporates content selection and planning without sacrificing end-to-end training. We decompose the generation task into two stages. Given a corpus of data records (paired with descriptive documents), we first generate a content plan highlighting which information should be mentioned and in which order and then generate the document while taking the content plan into account. Automatic and human-based evaluation experiments show that our model outperforms strong baselines improving the state-of-the-art on the recently released RotoWire dataset.


  Access Paper or Ask Questions

iCap: Interactive Image Captioning with Predictive Text

Feb 22, 2020
Zhengxiong Jia, Xirong Li

In this paper we study a brand new topic of interactive image captioning with human in the loop. Different from automated image captioning where a given test image is the sole input in the inference stage, we have access to both the test image and a sequence of (incomplete) user-input sentences in the interactive scenario. We formulate the problem as Visually Conditioned Sentence Completion (VCSC). For VCSC, we propose asynchronous bidirectional decoding for image caption completion (ABD-Cap). With ABD-Cap as the core module, we build iCap, a web-based interactive image captioning system capable of predicting new text with respect to live input from a user. A number of experiments covering both automated evaluations and real user studies show the viability of our proposals.


  Access Paper or Ask Questions

<<
787
788
789
790
791
792
793
794
795
796
797
798
799
>>