Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Recurrent Deconvolutional Generative Adversarial Networks with Application to Text Guided Video Generation

Aug 13, 2020
Hongyuan Yu, Yan Huang, Lihong Pi, Liang Wang

This paper proposes a novel model for video generation and especially makes the attempt to deal with the problem of video generation from text descriptions, i.e., synthesizing realistic videos conditioned on given texts. Existing video generation methods cannot be easily adapted to handle this task well, due to the frame discontinuity issue and their text-free generation schemes. To address these problems, we propose a recurrent deconvolutional generative adversarial network (RD-GAN), which includes a recurrent deconvolutional network (RDN) as the generator and a 3D convolutional neural network (3D-CNN) as the discriminator. The RDN is a deconvolutional version of conventional recurrent neural network, which can well model the long-range temporal dependency of generated video frames and make good use of conditional information. The proposed model can be jointly trained by pushing the RDN to generate realistic videos so that the 3D-CNN cannot distinguish them from real ones. We apply the proposed RD-GAN to a series of tasks including conventional video generation, conditional video generation, video prediction and video classification, and demonstrate its effectiveness by achieving well performance.

  Access Paper or Ask Questions

Efficient Path Prediction for Semi-Supervised and Weakly Supervised Hierarchical Text Classification

Feb 25, 2019
Huiru Xiao, Xin Liu, Yangqiu Song

Hierarchical text classification has many real-world applications. However, labeling a large number of documents is costly. In practice, we can use semi-supervised learning or weakly supervised learning (e.g., dataless classification) to reduce the labeling cost. In this paper, we propose a path cost-sensitive learning algorithm to utilize the structural information and further make use of unlabeled and weakly-labeled data. We use a generative model to leverage the large amount of unlabeled data and introduce path constraints into the learning algorithm to incorporate the structural information of the class hierarchy. The posterior probabilities of both unlabeled and weakly labeled data can be incorporated with path-dependent scores. Since we put a structure-sensitive cost to the learning algorithm to constrain the classification consistent with the class hierarchy and do not need to reconstruct the feature vectors for different structures, we can significantly reduce the computational cost compared to structural output learning. Experimental results on two hierarchical text classification benchmarks show that our approach is not only effective but also efficient to handle the semi-supervised and weakly supervised hierarchical text classification.

* Aceepted by 2019 World Wide Web Conference (WWW19) 

  Access Paper or Ask Questions

Relation extraction from clinical texts using domain invariant convolutional neural network

Jun 30, 2016
Sunil Kumar Sahu, Ashish Anand, Krishnadev Oruganty, Mahanandeeshwar Gattu

In recent years extracting relevant information from biomedical and clinical texts such as research articles, discharge summaries, or electronic health records have been a subject of many research efforts and shared challenges. Relation extraction is the process of detecting and classifying the semantic relation among entities in a given piece of texts. Existing models for this task in biomedical domain use either manually engineered features or kernel methods to create feature vector. These features are then fed to classifier for the prediction of the correct class. It turns out that the results of these methods are highly dependent on quality of user designed features and also suffer from curse of dimensionality. In this work we focus on extracting relations from clinical discharge summaries. Our main objective is to exploit the power of convolution neural network (CNN) to learn features automatically and thus reduce the dependency on manual feature engineering. We evaluate performance of the proposed model on i2b2-2010 clinical relation extraction challenge dataset. Our results indicate that convolution neural network can be a good model for relation exaction in clinical text without being dependent on expert's knowledge on defining quality features.

* This paper has been accepted in ACL BioNLP 2016 Workshop 

  Access Paper or Ask Questions

DeepHelp: Deep Learning for Shout Crisis Text Conversations

Oct 25, 2021
Daniel Cahn

The Shout Crisis Text Line provides individuals undergoing mental health crises an opportunity to have an anonymous text message conversation with a trained Crisis Volunteer (CV). This project partners with Shout and its parent organisation, Mental Health Innovations, to explore the applications of Machine Learning in understanding Shout's conversations and improving its service. The overarching aim of this project is to develop a proof-of-concept model to demonstrate the potential of applying deep learning to crisis text messages. Specifically, this project aims to use deep learning to (1) predict an individual's risk of suicide or self-harm, (2) assess conversation success and CV skill using robust metrics, and (3) extrapolate demographic information from a texter survey to conversations where the texter did not complete the survey. To these ends, contributions to deep learning include a modified Transformer-over-BERT model; a framework for multitask learning to improve generalisation in the presence of sparse labels; and a mathematical model for using imperfect machine learning models to estimate population parameters from a biased training set. Key results include a deep learning model with likely better performance at predicting suicide risk than trained CVs and the ability to predict whether a texter is 21 or under with 88.4% accuracy. We produce three metrics for conversation success and evaluate the validity and usefulness for each. Finally, reversal of participation bias provides evidence that women, who make up 80.3% of conversations with an associated texter survey, make up closer to 73.5%- 74.8% of all conversations; and that if, after every conversation, the texter had shared whether they found their conversation helpful, affirmative answers would fall from 85.1% to 45.45% - 46.51%.

* 81 pages 

  Access Paper or Ask Questions

LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations

Jun 10, 2021
Ruisheng Cao, Lu Chen, Zhi Chen, Yanbin Zhao, Su Zhu, Kai Yu

This work aims to tackle the challenging heterogeneous graph encoding problem in the text-to-SQL task. Previous methods are typically node-centric and merely utilize different weight matrices to parameterize edge types, which 1) ignore the rich semantics embedded in the topological structure of edges, and 2) fail to distinguish local and non-local relations for each node. To this end, we propose a Line Graph Enhanced Text-to-SQL (LGESQL) model to mine the underlying relational features without constructing meta-paths. By virtue of the line graph, messages propagate more efficiently through not only connections between nodes, but also the topology of directed edges. Furthermore, both local and non-local relations are integrated distinctively during the graph iteration. We also design an auxiliary task called graph pruning to improve the discriminative capability of the encoder. Our framework achieves state-of-the-art results (62.8% with Glove, 72.0% with Electra) on the cross-domain text-to-SQL benchmark Spider at the time of writing.

* 15 pages, 8 figures, accepted to ACL 2021 main conference 

  Access Paper or Ask Questions

From text saliency to linguistic objects: learning linguistic interpretable markers with a multi-channels convolutional architecture

Apr 07, 2020
Laurent Vanni, Marco Corneli, Damon Mayaffre, Frédéric Precioso

A lot of effort is currently made to provide methods to analyze and understand deep neural network impressive performances for tasks such as image or text classification. These methods are mainly based on visualizing the important input features taken into account by the network to build a decision. However these techniques, let us cite LIME, SHAP, Grad-CAM, or TDS, require extra effort to interpret the visualization with respect to expert knowledge. In this paper, we propose a novel approach to inspect the hidden layers of a fitted CNN in order to extract interpretable linguistic objects from texts exploiting classification process. In particular, we detail a weighted extension of the Text Deconvolution Saliency (wTDS) measure which can be used to highlight the relevant features used by the CNN to perform the classification task. We empirically demonstrate the efficiency of our approach on corpora from two different languages: English and French. On all datasets, wTDS automatically encodes complex linguistic objects based on co-occurrences and possibly on grammatical and syntax analysis.

* 7 pages, 22 figures 

  Access Paper or Ask Questions

SMPOST: Parts of Speech Tagger for Code-Mixed Indic Social Media Text

Feb 02, 2017
Deepak Gupta, Shubham Tripathi, Asif Ekbal, Pushpak Bhattacharyya

Use of social media has grown dramatically during the last few years. Users follow informal languages in communicating through social media. The language of communication is often mixed in nature, where people transcribe their regional language with English and this technique is found to be extremely popular. Natural language processing (NLP) aims to infer the information from these text where Part-of-Speech (PoS) tagging plays an important role in getting the prosody of the written text. For the task of PoS tagging on Code-Mixed Indian Social Media Text, we develop a supervised system based on Conditional Random Field classifier. In order to tackle the problem effectively, we have focused on extracting rich linguistic features. We participate in three different language pairs, ie. English-Hindi, English-Bengali and English-Telugu on three different social media platforms, Twitter, Facebook & WhatsApp. The proposed system is able to successfully assign coarse as well as fine-grained PoS tag labels for a given a code-mixed sentence. Experiments show that our system is quite generic that shows encouraging performance levels on all the three language pairs in all the domains.

* 5 pages, ICON 2016 

  Access Paper or Ask Questions

Deep multi-metric learning for text-independent speaker verification

Jul 17, 2020
Jiwei Xu, Xinggang Wang, Bin Feng, Wenyu Liu

Text-independent speaker verification is an important artificial intelligence problem that has a wide spectrum of applications, such as criminal investigation, payment certification, and interest-based customer services. The purpose of text-independent speaker verification is to determine whether two given uncontrolled utterances originate from the same speaker or not. Extracting speech features for each speaker using deep neural networks is a promising direction to explore and a straightforward solution is to train the discriminative feature extraction network by using a metric learning loss function. However, a single loss function often has certain limitations. Thus, we use deep multi-metric learning to address the problem and introduce three different losses for this problem, i.e., triplet loss, n-pair loss and angular loss. The three loss functions work in a cooperative way to train a feature extraction network equipped with Residual connections and squeeze-and-excitation attention. We conduct experiments on the large-scale \texttt{VoxCeleb2} dataset, which contains over a million utterances from over $6,000$ speakers, and the proposed deep neural network obtains an equal error rate of $3.48\%$, which is a very competitive result. Codes for both training and testing and pretrained models are available at \url{}, which is the first publicly available code repository for large-scale text-independent speaker verification with performance on par with the state-of-the-art systems.

* Neurocomputing, Volume 410, 14 October 2020, Pages 394-400 

  Access Paper or Ask Questions