Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Distraction-Based Neural Networks for Document Summarization

Oct 26, 2016
Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang

Distributed representation learned with neural networks has recently shown to be effective in modeling natural languages at fine granularities such as words, phrases, and even sentences. Whether and how such an approach can be extended to help model larger spans of text, e.g., documents, is intriguing, and further investigation would still be desirable. This paper aims to enhance neural network models for such a purpose. A typical problem of document-level modeling is automatic summarization, which aims to model documents in order to generate summaries. In this paper, we propose neural models to train computers not just to pay attention to specific regions and content of input documents with attention models, but also distract them to traverse between different content of a document so as to better grasp the overall meaning for summarization. Without engineering any features, we train the models on two large datasets. The models achieve the state-of-the-art performance, and they significantly benefit from the distraction modeling, particularly when input documents are long.

* IJCAI, 2016 
* Published in IJCAI-2016: the 25th International Joint Conference on Artificial Intelligence 

  Access Paper or Ask Questions

Fast Sampling for Bayesian Max-Margin Models

Oct 18, 2016
Wenbo Hu, Jun Zhu, Bo Zhang

Bayesian max-margin models have shown superiority in various practical applications, such as text categorization, collaborative prediction, social network link prediction and crowdsourcing, and they conjoin the flexibility of Bayesian modeling and predictive strengths of max-margin learning. However, Monte Carlo sampling for these models still remains challenging, especially for applications that involve large-scale datasets. In this paper, we present the stochastic subgradient Hamiltonian Monte Carlo (HMC) methods, which are easy to implement and computationally efficient. We show the approximate detailed balance property of subgradient HMC which reveals a natural and validated generalization of the ordinary HMC. Furthermore, we investigate the variants that use stochastic subsampling and thermostats for better scalability and mixing. Using stochastic subgradient Markov Chain Monte Carlo (MCMC), we efficiently solve the posterior inference task of various Bayesian max-margin models and extensive experimental results demonstrate the effectiveness of our approach.

  Access Paper or Ask Questions

Interpreting Neural Networks to Improve Politeness Comprehension

Oct 09, 2016
Malika Aubakirova, Mohit Bansal

We present an interpretable neural network approach to predicting and understanding politeness in natural language requests. Our models are based on simple convolutional neural networks directly on raw text, avoiding any manual identification of complex sentiment or syntactic features, while performing better than such feature-based models from previous work. More importantly, we use the challenging task of politeness prediction as a testbed to next present a much-needed understanding of what these successful networks are actually learning. For this, we present several network visualizations based on activation clusters, first derivative saliency, and embedding space transformations, helping us automatically identify several subtle linguistics markers of politeness theories. Further, this analysis reveals multiple novel, high-scoring politeness strategies which, when added back as new features, reduce the accuracy gap between the original featurized system and the neural model, thus providing a clear quantitative interpretation of the success of these neural networks.

* To appear at EMNLP 2016 

  Access Paper or Ask Questions

Unsupervised Learning from Narrated Instruction Videos

Jun 28, 2016
Jean-Baptiste Alayrac, Piotr Bojanowski, Nishant Agrawal, Josef Sivic, Ivan Laptev, Simon Lacoste-Julien

We address the problem of automatically learning the main steps to complete a certain task, such as changing a car tire, from a set of narrated instruction videos. The contributions of this paper are three-fold. First, we develop a new unsupervised learning approach that takes advantage of the complementary nature of the input video and the associated narration. The method solves two clustering problems, one in text and one in video, applied one after each other and linked by joint constraints to obtain a single coherent sequence of steps in both modalities. Second, we collect and annotate a new challenging dataset of real-world instruction videos from the Internet. The dataset contains about 800,000 frames for five different tasks that include complex interactions between people and objects, and are captured in a variety of indoor and outdoor settings. Third, we experimentally demonstrate that the proposed method can automatically discover, in an unsupervised manner, the main steps to achieve the task and locate the steps in the input videos.

* Appears in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016). 21 pages 

  Access Paper or Ask Questions

Towards Meaningful Maps of Polish Case Law

Mar 01, 2016
Michal Jungiewicz, Michał Łopuszyński

In this work, we analyze the utility of two dimensional document maps for exploratory analysis of Polish case law. We start by comparing two methods of generating such visualizations. First is based on linear principal component analysis (PCA). Second makes use of the modern nonlinear t-Distributed Stochastic Neighbor Embedding method (t-SNE). We apply both PCA and t-SNE to a corpus of judgments from different courts in Poland. It emerges that t-SNE provides better, more interpretable results than PCA. As a next test, we apply t-SNE to randomly selected sample of common court judgments corresponding to different keywords. We show that t-SNE, in this case, reveals hidden topical structure of the documents related to keyword,,pension". In conclusion, we find that the t-SNE method could be a promising tool to facilitate the exploitative analysis of legal texts, e.g., by complementing search or browse functionality in legal databases.

  Access Paper or Ask Questions

How essential are unstructured clinical narratives and information fusion to clinical trial recruitment?

Feb 13, 2015
Preethi Raghavan, James L. Chen, Eric Fosler-Lussier, Albert M. Lai

Electronic health records capture patient information using structured controlled vocabularies and unstructured narrative text. While structured data typically encodes lab values, encounters and medication lists, unstructured data captures the physician's interpretation of the patient's condition, prognosis, and response to therapeutic intervention. In this paper, we demonstrate that information extraction from unstructured clinical narratives is essential to most clinical applications. We perform an empirical study to validate the argument and show that structured data alone is insufficient in resolving eligibility criteria for recruiting patients onto clinical trials for chronic lymphocytic leukemia (CLL) and prostate cancer. Unstructured data is essential to solving 59% of the CLL trial criteria and 77% of the prostate cancer trial criteria. More specifically, for resolving eligibility criteria with temporal constraints, we show the need for temporal reasoning and information integration with medical events within and across unstructured clinical narratives and structured data.

* AMIA TBI 2014, 6 pages 

  Access Paper or Ask Questions

Modelling the Lexicon in Unsupervised Part of Speech Induction

Feb 26, 2014
Greg Dubbin, Phil Blunsom

Automatically inducing the syntactic part-of-speech categories for words in text is a fundamental task in Computational Linguistics. While the performance of unsupervised tagging models has been slowly improving, current state-of-the-art systems make the obviously incorrect assumption that all tokens of a given word type must share a single part-of-speech tag. This one-tag-per-type heuristic counters the tendency of Hidden Markov Model based taggers to over generate tags for a given word type. However, it is clearly incompatible with basic syntactic theory. In this paper we extend a state-of-the-art Pitman-Yor Hidden Markov Model tagger with an explicit model of the lexicon. In doing so we are able to incorporate a soft bias towards inducing few tags per type. We develop a particle filter for drawing samples from the posterior of our model and present empirical results that show that our model is competitive with and faster than the state-of-the-art without making any unrealistic restrictions.

* To be presented at the 14th Conference of the European Chapter of the Association for Computational Linguistics 

  Access Paper or Ask Questions

Japanese-Spanish Thesaurus Construction Using English as a Pivot

Mar 06, 2013
Jessica Ramírez, Masayuki Asahara, Yuji Matsumoto

We present the results of research with the goal of automatically creating a multilingual thesaurus based on the freely available resources of Wikipedia and WordNet. Our goal is to increase resources for natural language processing tasks such as machine translation targeting the Japanese-Spanish language pair. Given the scarcity of resources, we use existing English resources as a pivot for creating a trilingual Japanese-Spanish-English thesaurus. Our approach consists of extracting the translation tuples from Wikipedia, disambiguating them by mapping them to WordNet word senses. We present results comparing two methods of disambiguation, the first using VSM on Wikipedia article texts and WordNet definitions, and the second using categorical information extracted from Wikipedia, We find that mixing the two methods produces favorable results. Using the proposed method, we have constructed a multilingual Spanish-Japanese-English thesaurus consisting of 25,375 entries. The same method can be applied to any pair of languages that are linked to English in Wikipedia.

* In Proceeding of The Third International Joint Conference on Natural Language Processing (IJCNLP-08), Hyderabad, India. pages 473-480, 2008 

  Access Paper or Ask Questions

Marginalized Denoising Autoencoders for Domain Adaptation

Jun 18, 2012
Minmin Chen, Zhixiang Xu, Kilian Weinberger, Fei Sha

Stacked denoising autoencoders (SDAs) have been successfully used to learn new representations for domain adaptation. Recently, they have attained record accuracy on standard benchmark tasks of sentiment analysis across different text domains. SDAs learn robust data representations by reconstruction, recovering original features from data that are artificially corrupted with noise. In this paper, we propose marginalized SDA (mSDA) that addresses two crucial limitations of SDAs: high computational cost and lack of scalability to high-dimensional features. In contrast to SDAs, our approach of mSDA marginalizes noise and thus does not require stochastic gradient descent or other optimization algorithms to learn parameters ? in fact, they are computed in closed-form. Consequently, mSDA, which can be implemented in only 20 lines of MATLAB^{TM}, significantly speeds up SDAs by two orders of magnitude. Furthermore, the representations learnt by mSDA are as effective as the traditional SDAs, attaining almost identical accuracies in benchmark tasks.

* ICML2012 

  Access Paper or Ask Questions

Mostly-Unsupervised Statistical Segmentation of Japanese Kanji Sequences

May 10, 2002
Rie Kubota Ando, Lillian Lee

Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and syntactic analysis or on pre-segmented data; but these are labor-intensive, and the lexico-syntactic techniques are vulnerable to the unknown word problem. In contrast, we introduce a novel, more robust statistical method utilizing unsegmented training data. Despite its simplicity, the algorithm yields performance on long kanji sequences comparable to and sometimes surpassing that of state-of-the-art morphological analyzers over a variety of error metrics. The algorithm also outperforms another mostly-unsupervised statistical algorithm previously proposed for Chinese. Additionally, we present a two-level annotation scheme for Japanese to incorporate multiple segmentation granularities, and introduce two novel evaluation metrics, both based on the notion of a compatible bracket, that can account for multiple granularities simultaneously.

* Natural Language Engineering 9 (2), pp. 127--149, 2003 
* 22 pages. To appear in Natural Language Engineering 

  Access Paper or Ask Questions