Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

"I'm sorry Dave, I'm afraid I can't do that" Deep Q-learning from forbidden action

Oct 04, 2019
Mathieu Seurin, Philippe Preux, Olivier Pietquin

The use of Reinforcement Learning (RL) is still restricted to simulation or to enhance human-operated systems through recommendations. Real-world environments (e.g. industrial robots or power grids) are generally designed with safety constraints in mind implemented in the shape of valid actions masks or contingency controllers. For example, the range of motion and the angles of the motors of a robot can be limited to physical boundaries. Violating constraints thus results in rejected actions or entering in a safe mode driven by an external controller, making RL agents incapable of learning from their mistakes. In this paper, we propose a simple modification of a state-of-the-art deep RL algorithm (DQN), enabling learning from forbidden actions. To do so, the standard Q-learning update is enhanced with an extra safety loss inspired by structured classification. We empirically show that it reduces the number of hit constraints during the learning phase and accelerates convergence to near-optimal policies compared to using standard DQN. Experiments are done on a Visual Grid World Environment and Text-World domain.

* Accepted at NeurIPS-2019 Workshop on Safety and Robustness in Decision Making 

  Access Paper or Ask Questions

Detecting Deception in Political Debates Using Acoustic and Textual Features

Oct 04, 2019
Daniel Kopev, Ahmed Ali, Ivan Koychev, Preslav Nakov

We present work on deception detection, where, given a spoken claim, we aim to predict its factuality. While previous work in the speech community has relied on recordings from staged setups where people were asked to tell the truth or to lie and their statements were recorded, here we use real-world political debates. Thanks to the efforts of fact-checking organizations, it is possible to obtain annotations for statements in the context of a political discourse as true, half-true, or false. Starting with such data from the CLEF-2018 CheckThat! Lab, which was limited to text, we performed alignment to the corresponding videos, thus producing a multimodal dataset. We further developed a multimodal deep-learning architecture for the task of deception detection, which yielded sizable improvements over the state of the art for the CLEF-2018 Lab task 2. Our experiments show that the use of the acoustic signal consistently helped to improve the performance compared to using textual and metadata features only, based on several different evaluation measures. We release the new dataset to the research community, hoping to help advance the overall field of multimodal deception detection.

* ASRU-2019 

  Access Paper or Ask Questions

Maximizing Mutual Information for Tacotron

Aug 30, 2019
Peng Liu, Xixin Wu, Shiyin Kang, Guangzhi Li, Dan Su, Dong Yu

End-to-end speech synthesis method such as Tacotron, Tacotron2 and Transformer-TTS already achieves close to human quality performance. However compared to HMM-based method or NN-based frame-to-frame regression method, it is prone to some bad cases, such as missing words, repeating words and incomplete synthesis. More seriously, we cannot know whether such errors exist in a synthesized waveform or not unless we listen to it. We attribute the comparatively high sentence error rate to the local information preference of conditional autoregressive models. Inspired by the success of InfoGAN in learning interpretable representation by a mutual information regularization, in this paper, we propose to maximize the mutual information between the predicted acoustic features and the input text for end-to-end speech synthesis methods to address the local information preference problem and avoid such bad cases. What is more, we provide an indicator to detect errors in the predicted acoustic features as a byproduct. Experiment results show that our method can reduce the rate of bad cases and provide a reliable indicator to detect bad cases automatically.

  Access Paper or Ask Questions

Raw-to-End Name Entity Recognition in Social Media

Aug 14, 2019
Liyuan Liu, Zihan Wang, Jingbo Shang, Dandong Yin, Heng Ji, Xiang Ren, Shaowen Wang, Jiawei Han

Taking word sequences as the input, typical named entity recognition (NER) models neglect errors from pre-processing (e.g., tokenization). However, these errors can influence the model performance greatly, especially for noisy texts like tweets. Here, we introduce Neural-Char-CRF, a raw-to-end framework that is more robust to pre-processing errors. It takes raw character sequences as inputs and makes end-to-end predictions. Word embedding and contextualized representation models are further tailored to capture textual signals for each character instead of each word. Our model neither requires the conversion from character sequences to word sequences, nor assumes tokenizer can correctly detect all word boundaries. Moreover, we observe our model performance remains unchanged after replacing tokenization with string matching, which demonstrates its potential to be tokenization-free. Extensive experimental results on two public datasets demonstrate the superiority of our proposed method over the state of the art. The implementations and datasets are made available at:

  Access Paper or Ask Questions

Attention Guided Graph Convolutional Networks for Relation Extraction

Aug 09, 2019
Zhijiang Guo, Yan Zhang, Wei Lu

Dependency trees convey rich structural information that is proven useful for extracting relations among entities in text. However, how to effectively make use of relevant information while ignoring irrelevant information from the dependency trees remains a challenging research question. Existing approaches employing rule based hard-pruning strategies for selecting relevant partial dependency structures may not always yield optimal results. In this work, we propose Attention Guided Graph Convolutional Networks (AGGCNs), a novel model which directly takes full dependency trees as inputs. Our model can be understood as a soft-pruning approach that automatically learns how to selectively attend to the relevant sub-structures useful for the relation extraction task. Extensive results on various tasks including cross-sentence n-ary relation extraction and large-scale sentence-level relation extraction show that our model is able to better leverage the structural information of the full dependency trees, giving significantly better results than previous approaches.

* Accepted to ACL 2019, 11 pages, 4 figures, 5 tables 

  Access Paper or Ask Questions

Word Sense Disambiguation using Diffusion Kernel PCA

Jul 21, 2019
Bilge Sipal, Ozcan Sari, Asena Teke, Nurullah Demirci

One of the major problems in natural language processing (NLP) is the word sense disambiguation (WSD) problem. It is the task of computationally identifying the right sense of a polysemous word based on its context. Resolving the WSD problem boosts the accuracy of many NLP focused algorithms such as text classification and machine translation. In this paper, we introduce a new supervised algorithm for WSD, that is based on Kernel PCA and Semantic Diffusion Kernel, which is called Diffusion Kernel PCA (DKPCA). DKPCA grasps the semantic similarities within terms, and it is based on PCA. These properties enable us to perform feature extraction and dimension reduction guided by semantic similarities and within the algorithm. Our empirical results on SensEval data demonstrate that DKPCA achieves higher or very close accuracy results compared to SVM and KPCA with various well-known kernels when the labeled data ratio is meager. Considering the scarcity of labeled data, whereas large quantities of unlabeled textual data are easily accessible, these are highly encouraging first results to develop DKPCA further.

  Access Paper or Ask Questions

Hierarchical Sequence to Sequence Voice Conversion with Limited Data

Jul 15, 2019
Praveen Narayanan, Punarjay Chakravarty, Francois Charette, Gint Puskorius

We present a voice conversion solution using recurrent sequence to sequence modeling for DNNs. Our solution takes advantage of recent advances in attention based modeling in the fields of Neural Machine Translation (NMT), Text-to-Speech (TTS) and Automatic Speech Recognition (ASR). The problem consists of converting between voices in a parallel setting when {\it $<$source,target$>$} audio pairs are available. Our seq2seq architecture makes use of a hierarchical encoder to summarize input audio frames. On the decoder side, we use an attention based architecture used in recent TTS works. Since there is a dearth of large multispeaker voice conversion databases needed for training DNNs, we resort to training the network with a large single speaker dataset as an autoencoder. This is then adapted for the smaller multispeaker voice conversion datasets available for voice conversion. In contrast with other voice conversion works that use $F_0$, duration and linguistic features, our system uses mel spectrograms as the audio representation. Output mel frames are converted back to audio using a wavenet vocoder.

  Access Paper or Ask Questions

User-Oriented Summaries Using a PSO Based Scoring Optimization Method

Jun 26, 2019
Augusto Villa-Monte, Laura Lanzarini, Aurelio F. Bariviera, José A. Olivas

Automatic text summarization tools have a great impact on many fields, such as medicine, law, and scientific research in general. As information overload increases, automatic summaries allow handling the growing volume of documents, usually by assigning weights to the extracted phrases based on their significance in the expected summary. Obtaining the main contents of any given document in less time than it would take to do that manually is still an issue of interest. In~this~ article, a new method is presented that allows automatically generating extractive summaries from documents by adequately weighting sentence scoring features using \textit{Particle Swarm Optimization}. The key feature of the proposed method is the identification of those features that are closest to the criterion used by the individual when summarizing. The proposed method combines a binary representation and a continuous one, using an original variation of the technique developed by the authors of this paper. Our paper shows that using user labeled information in the training set helps to find better metrics and weights. The empirical results yield an improved accuracy compared to previous methods used in this field

* Entropy. 2019; 21(6):617 

  Access Paper or Ask Questions

What do Language Representations Really Represent?

Jan 09, 2019
Johannes Bjerva, Robert Östling, Maria Han Veiga, Jörg Tiedemann, Isabelle Augenstein

A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words end up with similar representations. If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations. We show that this holds even when the multilingual corpus has been translated into English, by picking up the faint signal left by the source languages. However, just like it is a thorny problem to separate semantic from syntactic similarity in word representations, it is not obvious what type of similarity is captured by language representations. We investigate correlations and causal relationships between language representations learned from translations on one hand, and genetic, geographical, and several levels of structural similarity between languages on the other. Of these, structural similarity is found to correlate most strongly with language representation similarity, while genetic relationships---a convenient benchmark used for evaluation in previous work---appears to be a confounding factor. Apart from implications about translation effects, we see this more generally as a case where NLP and linguistic typology can interact and benefit one another.

* 8 pages, accepted for publication in Computational Linguistics (squib) 

  Access Paper or Ask Questions