Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Exploring text datasets by visualizing relevant words

Jul 17, 2017
Franziska Horn, Leila Arras, Grégoire Montavon, Klaus-Robert Müller, Wojciech Samek

When working with a new dataset, it is important to first explore and familiarize oneself with it, before applying any advanced machine learning algorithms. However, to the best of our knowledge, no tools exist that quickly and reliably give insight into the contents of a selection of documents with respect to what distinguishes them from other documents belonging to different categories. In this paper we propose to extract `relevant words' from a collection of texts, which summarize the contents of documents belonging to a certain class (or discovered cluster in the case of unlabeled datasets), and visualize them in word clouds to allow for a survey of salient features at a glance. We compare three methods for extracting relevant words and demonstrate the usefulness of the resulting word clouds by providing an overview of the classes contained in a dataset of scientific publications as well as by discovering trending topics from recent New York Times article snippets.


  Access Paper or Ask Questions

Restricted Boltzmann Machine for Classification with Hierarchical Correlated Prior

Apr 20, 2015
Gang Chen, Sargur H. Srihari

Restricted Boltzmann machines (RBM) and its variants have become hot research topics recently, and widely applied to many classification problems, such as character recognition and document categorization. Often, classification RBM ignores the interclass relationship or prior knowledge of sharing information among classes. In this paper, we are interested in RBM with the hierarchical prior over classes. We assume parameters for nearby nodes are correlated in the hierarchical tree, and further the parameters at each node of the tree be orthogonal to those at its ancestors. We propose a hierarchical correlated RBM for classification problem, which generalizes the classification RBM with sharing information among different classes. In order to reduce the redundancy between node parameters in the hierarchy, we also introduce orthogonal restrictions to our objective function. We test our method on challenge datasets, and show promising results compared to competitive baselines.

* 13 pages, 5 figures 

  Access Paper or Ask Questions

Real-time Dynamic MRI Reconstruction using Stacked Denoising Autoencoder

Mar 22, 2015
Angshul Majumdar

In this work we address the problem of real-time dynamic MRI reconstruction. There are a handful of studies on this topic; these techniques are either based on compressed sensing or employ Kalman Filtering. These techniques cannot achieve the reconstruction speed necessary for real-time reconstruction. In this work, we propose a new approach to MRI reconstruction. We learn a non-linear mapping from the unstructured aliased images to the corresponding clean images using a stacked denoising autoencoder (SDAE). The training for SDAE is slow, but the reconstruction is very fast - only requiring a few matrix vector multiplications. In this work, we have shown that using SDAE one can reconstruct the MRI frame faster than the data acquisition rate, thereby achieving real-time reconstruction. The quality of reconstruction is of the same order as a previous compressed sensing based online reconstruction technique.


  Access Paper or Ask Questions

Bayesian Optimization with Unknown Constraints

Mar 22, 2014
Michael A. Gelbart, Jasper Snoek, Ryan P. Adams

Recent work on Bayesian optimization has shown its effectiveness in global optimization of difficult black-box objective functions. Many real-world optimization problems of interest also have constraints which are unknown a priori. In this paper, we study Bayesian optimization for constrained problems in the general case that noise may be present in the constraint functions, and the objective and constraints may be evaluated independently. We provide motivating practical examples, and present a general framework to solve such problems. We demonstrate the effectiveness of our approach on optimizing the performance of online latent Dirichlet allocation subject to topic sparsity constraints, tuning a neural network given test-time memory constraints, and optimizing Hamiltonian Monte Carlo to achieve maximal effectiveness in a fixed time, subject to passing standard convergence diagnostics.

* 14 pages, 3 figures 

  Access Paper or Ask Questions

Is getting the right answer just about choosing the right words? The role of syntactically-informed features in short answer scoring

Mar 05, 2014
Derrick Higgins, Chris Brew, Michael Heilman, Ramon Ziai, Lei Chen, Aoife Cahill, Michael Flor, Nitin Madnani, Joel Tetreault, Daniel Blanchard, Diane Napolitano, Chong Min Lee, John Blackmore

Developments in the educational landscape have spurred greater interest in the problem of automatically scoring short answer questions. A recent shared task on this topic revealed a fundamental divide in the modeling approaches that have been applied to this problem, with the best-performing systems split between those that employ a knowledge engineering approach and those that almost solely leverage lexical information (as opposed to higher-level syntactic information) in assigning a score to a given response. This paper aims to introduce the NLP community to the largest corpus currently available for short-answer scoring, provide an overview of methods used in the shared task using this data, and explore the extent to which more syntactically-informed features can contribute to the short answer scoring task in a way that avoids the question-specific manual effort of the knowledge engineering approach.


  Access Paper or Ask Questions

Categorizing ancient documents

Aug 28, 2013
Nizar Zaghden, Remy Mullot, Mohamed Adel Alimi

The analysis of historical documents is still a topical issue given the importance of information that can be extracted and also the importance given by the institutions to preserve their heritage. The main idea in order to characterize the content of the images of ancient documents after attempting to clean the image is segmented blocks texts from the same image and tries to find similar blocks in either the same image or the entire image database. Most approaches of offline handwriting recognition proceed by segmenting words into smaller pieces (usually characters) which are recognized separately. Recognition of a word then requires the recognition of all characters (OCR) that compose it. Our work focuses mainly on the characterization of classes in images of old documents. We use Som toolbox for finding classes in documents. We applied also fractal dimensions and points of interest to categorize and match ancient documents.

* IJCSI International Journal of Computer Science Issues, Vol. 10, Issue 2, No 2, March 2013 ISSN (Print): 1694-0814 | ISSN (Online): 1694-0784 www.IJCSI.org 
* 10 pages 

  Access Paper or Ask Questions

Turning Speech Into Scripts

Jun 09, 2000
Manny Rayner, Beth Ann Hockey, Frankie James

We describe an architecture for implementing spoken natural language dialogue interfaces to semi-autonomous systems, in which the central idea is to transform the input speech signal through successive levels of representation corresponding roughly to linguistic knowledge, dialogue knowledge, and domain knowledge. The final representation is an executable program in a simple scripting language equivalent to a subset of Cshell. At each stage of the translation process, an input is transformed into an output, producing as a byproduct a "meta-output" which describes the nature of the transformation performed. We show how consistent use of the output/meta-output distinction permits a simple and perspicuous treatment of apparently diverse topics including resolution of pronouns, correction of user misconceptions, and optimization of scripts. The methods described have been concretely realized in a prototype speech interface to a simulation of the Personal Satellite Assistant.

* AAAI Spring Symposium on Natural Dialogues with Practical Robotic Devices, March 20-22, 2000. Stanford, CA 
* Working notes from AAAI Spring Symposium 

  Access Paper or Ask Questions

Design and Implementation of a Tactical Generator for Turkish, a Free Constituent Order Language

Jul 30, 1996
Dilek Zeynep Hakkani

This thesis describes a tactical generator for Turkish, a free constituent order language, in which the order of the constituents may change according to the information structure of the sentences to be generated. In the absence of any information regarding the information structure of a sentence (i.e., topic, focus, background, etc.), the constituents of the sentence obey a default order, but the order is almost freely changeable, depending on the constraints of the text flow or discourse. We have used a recursively structured finite state machine for handling the changes in constituent order, implemented as a right-linear grammar backbone. Our implementation environment is the GenKit system, developed at Carnegie Mellon University--Center for Machine Translation. Morphological realization has been implemented using an external morphological analysis/generation component which performs concrete morpheme selection and handles morphographemic processes.

* M.Sc. Thesis submitted to the Department of Computer Engineering and Information Science, Bilkent University, Ankara, Turkey. 146 pages (including title pages). Also available as: ftp://ftp.cs.bilkent.edu.tr/pub/tech-reports/1996/BU-CEIS-9614.ps.z 

  Access Paper or Ask Questions

Tactical Generation in a Free Constituent Order Language

May 05, 1996
Dilek Zeynep Hakkani, Kemal Oflazer, Ilyas Cicekli

This paper describes tactical generation in Turkish, a free constituent order language, in which the order of the constituents may change according to the information structure of the sentences to be generated. In the absence of any information regarding the information structure of a sentence (i.e., topic, focus, background, etc.), the constituents of the sentence obey a default order, but the order is almost freely changeable, depending on the constraints of the text flow or discourse. We have used a recursively structured finite state machine for handling the changes in constituent order, implemented as a right-linear grammar backbone. Our implementation environment is the GenKit system, developed at Carnegie Mellon University--Center for Machine Translation. Morphological realization has been implemented using an external morphological analysis/generation component which performs concrete morpheme selection and handles morphographemic processes.

* Proceedings of 1996 International Workshop on Natural Language Generation 
* gzipped, uuencoded postscript file 

  Access Paper or Ask Questions

Practical Methods for Proving Termination of General Logic Programs

Apr 01, 1996
E. Marchiori

Termination of logic programs with negated body atoms (here called general logic programs) is an important topic. One reason is that many computational mechanisms used to process negated atoms, like Clark's negation as failure and Chan's constructive negation, are based on termination conditions. This paper introduces a methodology for proving termination of general logic programs w.r.t. the Prolog selection rule. The idea is to distinguish parts of the program depending on whether or not their termination depends on the selection rule. To this end, the notions of low-, weakly up-, and up-acceptable program are introduced. We use these notions to develop a methodology for proving termination of general logic programs, and show how interesting problems in non-monotonic reasoning can be formalized and implemented by means of terminating general logic programs.

* Journal of Artificial Intelligence Research, Vol 4, (1996), 179-208 
* See http://www.jair.org/ for any accompanying files 

  Access Paper or Ask Questions

<<
265
266
267
268
269
270
271
272
273
274
275
276
277
>>