Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

LRMM: Learning to Recommend with Missing Modalities

Aug 30, 2018
Cheng Wang, Mathias Niepert, Hui Li

Multimodal learning has shown promising performance in content-based recommendation due to the auxiliary user and item information of multiple modalities such as text and images. However, the problem of incomplete and missing modality is rarely explored and most existing methods fail in learning a recommendation model with missing or corrupted modalities. In this paper, we propose LRMM, a novel framework that mitigates not only the problem of missing modalities but also more generally the cold-start problem of recommender systems. We propose modality dropout (m-drop) and a multimodal sequential autoencoder (m-auto) to learn multimodal representations for complementing and imputing missing modalities. Extensive experiments on real-world Amazon data show that LRMM achieves state-of-the-art performance on rating prediction tasks. More importantly, LRMM is more robust to previous methods in alleviating data-sparsity and the cold-start problem.

* 11 pages, EMNLP 2018 

  Access Paper or Ask Questions

An Ensemble Model for Sentiment Analysis of Hindi-English Code-Mixed Data

Jun 12, 2018
Madan Gopal Jhanwar, Arpita Das

In multilingual societies like India, code-mixed social media texts comprise the majority of the Internet. Detecting the sentiment of the code-mixed user opinions plays a crucial role in understanding social, economic and political trends. In this paper, we propose an ensemble of character-trigrams based LSTM model and word-ngrams based Multinomial Naive Bayes (MNB) model to identify the sentiments of Hindi-English (Hi-En) code-mixed data. The ensemble model combines the strengths of rich sequential patterns from the LSTM model and polarity of keywords from the probabilistic ngram model to identify sentiments in sparse and inconsistent code-mixed data. Experiments on reallife user code-mixed data reveals that our approach yields state-of-the-art results as compared to several baselines and other deep learning based proposed methods.

  Access Paper or Ask Questions

A Deep Learning Model with Hierarchical LSTMs and Supervised Attention for Anti-Phishing

May 03, 2018
Minh Nguyen, Toan Nguyen, Thien Huu Nguyen

Anti-phishing aims to detect phishing content/documents in a pool of textual data. This is an important problem in cybersecurity that can help to guard users from fraudulent information. Natural language processing (NLP) offers a natural solution for this problem as it is capable of analyzing the textual content to perform intelligent recognition. In this work, we investigate state-of-the-art techniques for text categorization in NLP to address the problem of anti-phishing for emails (i.e, predicting if an email is phishing or not). These techniques are based on deep learning models that have attracted much attention from the community recently. In particular, we present a framework with hierarchical long short-term memory networks (H-LSTMs) and attention mechanisms to model the emails simultaneously at the word and the sentence level. Our expectation is to produce an effective model for anti-phishing and demonstrate the effectiveness of deep learning for problems in cybersecurity.

* In: R. Verma, A. Das. (eds.): Proceedings of the 1st Anti-Phishing Shared Pilot at 4th ACM International Workshop on Security and Privacy Analytics (IWSPA 2018), Tempe, Arizona, USA, 21-03-2018 

  Access Paper or Ask Questions

Online Multi-Label Classification: A Label Compression Method

Apr 04, 2018
Zahra Ahmadi, Stefan Kramer

Many modern applications deal with multi-label data, such as functional categorizations of genes, image labeling and text categorization. Classification of such data with a large number of labels and latent dependencies among them is a challenging task, and it becomes even more challenging when the data is received online and in chunks. Many of the current multi-label classification methods require a lot of time and memory, which make them infeasible for practical real-world applications. In this paper, we propose a fast linear label space dimension reduction method that transforms the labels into a reduced encoded space and trains models on the obtained pseudo labels. Additionally, it provides an analytical method to update the decoding matrix which maps the labels into the original space and is used during the test phase. Experimental results show the effectiveness of this approach in terms of running times and the prediction performance over different measures.

  Access Paper or Ask Questions

An Implementation of Back-Propagation Learning on GF11, a Large SIMD Parallel Computer

Jan 04, 2018
Michael Witbrock, Marco Zagha

Current connectionist simulations require huge computational resources. We describe a neural network simulator for the IBM GF11, an experimental SIMD machine with 566 processors and a peak arithmetic performance of 11 Gigaflops. We present our parallel implementation of the backpropagation learning algorithm, techniques for increasing efficiency, performance measurements on the NetTalk text-to-speech benchmark, and a performance model for the simulator. Our simulator currently runs the back-propagation learning algorithm at 900 million connections per second, where each "connection per second" includes both a forward and backward pass. This figure was obtained on the machine when only 356 processors were working; with all 566 processors operational, our simulation will run at over one billion connections per second. We conclude that the GF11 is well-suited to neural network simulation, and we analyze our use of the machine to determine which features are the most important for high performance.

* Witbrock, M., and Zagha, M. (1989). "An Implementation of Back-Propagation Learning on GF11, a Large SIMD Parallel Computer." School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, Technical Report CMU-CS-89-208 

  Access Paper or Ask Questions

AWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus

Dec 05, 2017
Willie Boag, Hassan Kané

In recent years, word embeddings have been surprisingly effective at capturing intuitive characteristics of the words they represent. These vectors achieve the best results when training corpora are extremely large, sometimes billions of words. Clinical natural language processing datasets, however, tend to be much smaller. Even the largest publicly-available dataset of medical notes is three orders of magnitude smaller than the dataset of the oft-used "Google News" word vectors. In order to make up for limited training data sizes, we encode expert domain knowledge into our embeddings. Building on a previous extension of word2vec, we show that generalizing the notion of a word's "context" to include arbitrary features creates an avenue for encoding domain knowledge into word embeddings. We show that the word vectors produced by this method outperform their text-only counterparts across the board in correlation with clinical experts.

  Access Paper or Ask Questions

Automatic Summarization of Online Debates

Aug 15, 2017
Nattapong Sanchan, Ahmet Aker, Kalina Bontcheva

Debate summarization is one of the novel and challenging research areas in automatic text summarization which has been largely unexplored. In this paper, we develop a debate summarization pipeline to summarize key topics which are discussed or argued in the two opposing sides of online debates. We view that the generation of debate summaries can be achieved by clustering, cluster labeling, and visualization. In our work, we investigate two different clustering approaches for the generation of the summaries. In the first approach, we generate the summaries by applying purely term-based clustering and cluster labeling. The second approach makes use of X-means for clustering and Mutual Information for labeling the clusters. Both approaches are driven by ontologies. We visualize the results using bar charts. We think that our results are a smooth entry for users aiming to receive the first impression about what is discussed within a debate topic containing waste number of argumentations.

* Accepted and to be published in Natural Language Processing and Information Retrieval workshop, Recent Advances in Natural Language Processing 2017 (RANLP 2017) 

  Access Paper or Ask Questions

Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation

Jul 21, 2017
Alexander Panchenko, Fide Marten, Eugen Ruppert, Stefano Faralli, Dmitry Ustalov, Simone Paolo Ponzetto, Chris Biemann

Interpretability of a predictive model is a powerful feature that gains the trust of users in the correctness of the predictions. In word sense disambiguation (WSD), knowledge-based systems tend to be much more interpretable than knowledge-free counterparts as they rely on the wealth of manually-encoded elements representing word senses, such as hypernyms, usage examples, and images. We present a WSD system that bridges the gap between these two so far disconnected groups of methods. Namely, our system, providing access to several state-of-the-art WSD models, aims to be interpretable as a knowledge-based system while it remains completely unsupervised and knowledge-free. The presented tool features a Web interface for all-word disambiguation of texts that makes the sense predictions human readable by providing interpretable word sense inventories, sense representations, and disambiguation results. We provide a public API, enabling seamless integration.

* In Proceedings of the the Conference on Empirical Methods on Natural Language Processing (EMNLP 2017). 2017. Copenhagen, Denmark. Association for Computational Linguistics 

  Access Paper or Ask Questions

Dual Supervised Learning

Jul 03, 2017
Yingce Xia, Tao Qin, Wei Chen, Jiang Bian, Nenghai Yu, Tie-Yan Liu

Many supervised learning tasks are emerged in dual forms, e.g., English-to-French translation vs. French-to-English translation, speech recognition vs. text to speech, and image classification vs. image generation. Two dual tasks have intrinsic connections with each other due to the probabilistic correlation between their models. This connection is, however, not effectively utilized today, since people usually train the models of two dual tasks separately and independently. In this work, we propose training the models of two dual tasks simultaneously, and explicitly exploiting the probabilistic correlation between them to regularize the training process. For ease of reference, we call the proposed approach \emph{dual supervised learning}. We demonstrate that dual supervised learning can improve the practical performances of both tasks, for various applications including machine translation, image processing, and sentiment analysis.

* ICML 2017 

  Access Paper or Ask Questions

A Lightweight Regression Method to Infer Psycholinguistic Properties for Brazilian Portuguese

May 19, 2017
Leandro B. dos Santos, Magali S. Duran, Nathan S. Hartmann, Arnaldo Candido Jr., Gustavo H. Paetzold, Sandra M. Aluisio

Psycholinguistic properties of words have been used in various approaches to Natural Language Processing tasks, such as text simplification and readability assessment. Most of these properties are subjective, involving costly and time-consuming surveys to be gathered. Recent approaches use the limited datasets of psycholinguistic properties to extend them automatically to large lexicons. However, some of the resources used by such approaches are not available to most languages. This study presents a method to infer psycholinguistic properties for Brazilian Portuguese (BP) using regressors built with a light set of features usually available for less resourced languages: word length, frequency lists, lexical databases composed of school dictionaries and word embedding models. The correlations between the properties inferred are close to those obtained by related works. The resulting resource contains 26,874 words in BP annotated with concreteness, age of acquisition, imageability and subjective frequency.

* Paper accepted for TSD2017 

  Access Paper or Ask Questions