Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features

Nov 21, 2019
Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto

This paper presents a simple yet effective method to achieve prosody transfer from a reference speech signal to synthesized speech. The main idea is to incorporate well-known acoustic correlates of prosody such as pitch and loudness contours of the reference speech into a modern neural text-to-speech (TTS) synthesizer such as Tacotron2 (TC2). More specifically, a small set of acoustic features are extracted from the reference audio and then used to condition a TC2 synthesizer. The trained model is evaluated using subjective listening tests and novel objective evaluations of prosody transfer are proposed. Listening tests show that the synthesized speech is rated as highly natural and that prosody is successfully transferred from the reference speech signal to the synthesized signal.

* 6 pages, in review for conference publication 

  Access Paper or Ask Questions

Extension of TSVM to Multi-Class and Hierarchical Text Classification Problems With General Losses

Nov 01, 2012
Sathiya Keerthi Selvaraj, Sundararajan Sellamanickam, Shirish Shevade

Transductive SVM (TSVM) is a well known semi-supervised large margin learning method for binary text classification. In this paper we extend this method to multi-class and hierarchical classification problems. We point out that the determination of labels of unlabeled examples with fixed classifier weights is a linear programming problem. We devise an efficient technique for solving it. The method is applicable to general loss functions. We demonstrate the value of the new method using large margin loss on a number of multi-class and hierarchical classification datasets. For maxent loss we show empirically that our method is better than expectation regularization/constraint and posterior regularization methods, and competitive with the version of entropy regularization method which uses label constraints.

  Access Paper or Ask Questions

Generating Plausible Counterfactual Explanations for Deep Transformers in Financial Text Classification

Oct 23, 2020
Linyi Yang, Eoin M. Kenny, Tin Lok James Ng, Yi Yang, Barry Smyth, Ruihai Dong

Corporate mergers and acquisitions (M&A) account for billions of dollars of investment globally every year, and offer an interesting and challenging domain for artificial intelligence. However, in these highly sensitive domains, it is crucial to not only have a highly robust and accurate model, but be able to generate useful explanations to garner a user's trust in the automated system. Regrettably, the recent research regarding eXplainable AI (XAI) in financial text classification has received little to no attention, and many current methods for generating textual-based explanations result in highly implausible explanations, which damage a user's trust in the system. To address these issues, this paper proposes a novel methodology for producing plausible counterfactual explanations, whilst exploring the regularization benefits of adversarial training on language models in the domain of FinTech. Exhaustive quantitative experiments demonstrate that not only does this approach improve the model accuracy when compared to the current state-of-the-art and human performance, but it also generates counterfactual explanations which are significantly more plausible based on human trials.

* Accepted by COLING-20 (Oral) 

  Access Paper or Ask Questions

Forget me not: A Gentle Reminder to Mind the Simple Multi-Layer Perceptron Baseline for Text Classification

Sep 08, 2021
Lukas Galke, Ansgar Scherp

Graph neural networks have triggered a resurgence of graph-based text classification. We show that already a simple MLP baseline achieves comparable performance on benchmark datasets, questioning the importance of synthetic graph structures. When considering an inductive scenario, i. e., when adding new documents to a corpus, a simple MLP even outperforms most graph-based models. We further fine-tune DistilBERT for comparison and find that it outperforms all state-of-the-art models. We suggest that future studies use at least an MLP baseline to contextualize the results. We provide recommendations for the design and training of such a baseline.

* 5 pages 

  Access Paper or Ask Questions

LeafNATS: An Open-Source Toolkit and Live Demo System for Neural Abstractive Text Summarization

May 28, 2019
Tian Shi, Ping Wang, Chandan K. Reddy

Neural abstractive text summarization (NATS) has received a lot of attention in the past few years from both industry and academia. In this paper, we introduce an open-source toolkit, namely LeafNATS, for training and evaluation of different sequence-to-sequence based models for the NATS task, and for deploying the pre-trained models to real-world applications. The toolkit is modularized and extensible in addition to maintaining competitive performance in the NATS task. A live news blogging system has also been implemented to demonstrate how these models can aid blog/news editors by providing them suggestions of headlines and summaries of their articles.

* Accepted by NAACL-HLT 2019 demo track 

  Access Paper or Ask Questions

Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition

Nov 24, 2021
Changxu Cheng, Bohan Li, Qi Zheng, Yongpan Wang, Wenyu Liu

Semantic information has been proved effective in scene text recognition. Most existing methods tend to couple both visual and semantic information in an attention-based decoder. As a result, the learning of semantic features is prone to have a bias on the limited vocabulary of the training set, which is called vocabulary reliance. In this paper, we propose a novel Visual-Semantic Decoupling Network (VSDN) to address the problem. Our VSDN contains a Visual Decoder (VD) and a Semantic Decoder (SD) to learn purer visual and semantic feature representation respectively. Besides, a Semantic Encoder (SE) is designed to match SD, which can be pre-trained together by additional inexpensive large vocabulary via a simple word correction task. Thus the semantic feature is more unbiased and precise to guide the visual feature alignment and enrich the final character representation. Experiments show that our method achieves state-of-the-art or competitive results on the standard benchmarks, and outperforms the popular baseline by a large margin under circumstances where the training set has a small size of vocabulary.

  Access Paper or Ask Questions

Redditors in Recovery: Text Mining Reddit to Investigate Transitions into Drug Addiction

Mar 11, 2019
John Lu, Sumati Sridhar, Ritika Pandey, Mohammad Al Hasan, George Mohler

Increasing rates of opioid drug abuse and heightened prevalence of online support communities underscore the necessity of employing data mining techniques to better understand drug addiction using these rapidly developing online resources. In this work, we obtain data from Reddit, an online collection of forums, to gather insight into drug use/misuse using text data from users themselves. Specifically, using user posts, we trained 1) a binary classifier which predicts transitions from casual drug discussion forums to drug recovery forums and 2) a Cox regression model that outputs likelihoods of such transitions. In doing so, we found that utterances of select drugs and certain linguistic features contained in one's posts can help predict these transitions. Using unfiltered drug-related posts, our research delineates drugs that are associated with higher rates of transitions from recreational drug discussion to support/recovery discussion, offers insight into modern drug culture, and provides tools with potential applications in combating the opioid crisis.

* 2018 IEEE International Conference on Big Data 

  Access Paper or Ask Questions

Text-Independent Speaker Verification Using 3D Convolutional Neural Networks

Jun 06, 2018
Amirsina Torfi, Jeremy Dawson, Nasser M. Nasrabadi

In this paper, a novel method using 3D Convolutional Neural Network (3D-CNN) architecture has been proposed for speaker verification in the text-independent setting. One of the main challenges is the creation of the speaker models. Most of the previously-reported approaches create speaker models based on averaging the extracted features from utterances of the speaker, which is known as the d-vector system. In our paper, we propose an adaptive feature learning by utilizing the 3D-CNNs for direct speaker model creation in which, for both development and enrollment phases, an identical number of spoken utterances per speaker is fed to the network for representing the speakers' utterances and creation of the speaker model. This leads to simultaneously capturing the speaker-related information and building a more robust system to cope with within-speaker variation. We demonstrate that the proposed method significantly outperforms the traditional d-vector verification system. Moreover, the proposed system can also be an alternative to the traditional d-vector system which is a one-shot speaker modeling system by utilizing 3D-CNNs.

* Accepted to be published in IEEE International Conference on Multimedia and Expo (ICME) 2018 

  Access Paper or Ask Questions

Variational Learning for the Inverted Beta-Liouville Mixture Model and Its Application to Text Categorization

Dec 29, 2021
Yongfa Ling, Wenbo Guan, Qiang Ruan, Heping Song, Yuping Lai

The finite invert Beta-Liouville mixture model (IBLMM) has recently gained some attention due to its positive data modeling capability. Under the conventional variational inference (VI) framework, the analytically tractable solution to the optimization of the variational posterior distribution cannot be obtained, since the variational object function involves evaluation of intractable moments. With the recently proposed extended variational inference (EVI) framework, a new function is proposed to replace the original variational object function in order to avoid intractable moment computation, so that the analytically tractable solution of the IBLMM can be derived in an elegant way. The good performance of the proposed approach is demonstrated by experiments with both synthesized data and a real-world application namely text categorization.

  Access Paper or Ask Questions

Building a Knowledge Graph from Natural Language Definitions for Interpretable Text Entailment Recognition

Jun 20, 2018
Vivian S. Silva, André Freitas, Siegfried Handschuh

Natural language definitions of terms can serve as a rich source of knowledge, but structuring them into a comprehensible semantic model is essential to enable them to be used in semantic interpretation tasks. We propose a method and provide a set of tools for automatically building a graph world knowledge base from natural language definitions. Adopting a conceptual model composed of a set of semantic roles for dictionary definitions, we trained a classifier for automatically labeling definitions, preparing the data to be later converted to a graph representation. WordNetGraph, a knowledge graph built out of noun and verb WordNet definitions according to this methodology, was successfully used in an interpretable text entailment recognition approach which uses paths in this graph to provide clear justifications for entailment decisions.

* Proceedings of the Eleventh International Conference on Language Resources and Evaluation, Miyazaki, Japan, 2018 
* 5 pages, 5 figures, presented at LREC 2018 

  Access Paper or Ask Questions