Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Words are not Equal: Graded Weighting Model for building Composite Document Vectors

Dec 11, 2015

Pranjal Singh, Amitabha Mukerjee

Figure 1 for Words are not Equal: Graded Weighting Model for building Composite Document Vectors

Figure 2 for Words are not Equal: Graded Weighting Model for building Composite Document Vectors

Figure 3 for Words are not Equal: Graded Weighting Model for building Composite Document Vectors

Figure 4 for Words are not Equal: Graded Weighting Model for building Composite Document Vectors

Share this with someone who'll enjoy it:

Abstract:Despite the success of distributional semantics, composing phrases from word vectors remains an important challenge. Several methods have been tried for benchmark tasks such as sentiment classification, including word vector averaging, matrix-vector approaches based on parsing, and on-the-fly learning of paragraph vectors. Most models usually omit stop words from the composition. Instead of such an yes-no decision, we consider several graded schemes where words are weighted according to their discriminatory relevance with respect to its use in the document (e.g., idf). Some of these methods (particularly tf-idf) are seen to result in a significant improvement in performance over prior state of the art. Further, combining such approaches into an ensemble based on alternate classifiers such as the RNN model, results in an 1.6% performance improvement on the standard IMDB movie review dataset, and a 7.01% improvement on Amazon product reviews. Since these are language free models and can be obtained in an unsupervised manner, they are of interest also for under-resourced languages such as Hindi as well and many more languages. We demonstrate the language free aspects by showing a gain of 12% for two review datasets over earlier results, and also release a new larger dataset for future testing (Singh,2015).

* 10 Pages, 2 Figures, 11 Tables

View paper on

Share this with someone who'll enjoy it:

Title:Words are not Equal: Graded Weighting Model for building Composite Document Vectors

Paper and Code