Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Confounds and Consequences in Geotagged Twitter Data

Aug 22, 2015
Umashanthi Pavalanathan, Jacob Eisenstein

Twitter is often used in quantitative studies that identify geographically-preferred topics, writing styles, and entities. These studies rely on either GPS coordinates attached to individual messages, or on the user-supplied location field in each profile. In this paper, we compare these data acquisition techniques and quantify the biases that they introduce; we also measure their effects on linguistic analysis and text-based geolocation. GPS-tagging and self-reported locations yield measurably different corpora, and these linguistic differences are partially attributable to differences in dataset composition by age and gender. Using a latent variable model to induce age and gender, we show how these demographic variables interact with geography to affect language use. We also show that the accuracy of text-based geolocation varies with population demographics, giving the best results for men above the age of 40.

* final version for EMNLP 2015 

  Access Paper or Ask Questions

Is Stack Overflow Overflowing With Questions and Tags

Aug 14, 2015
Ranjitha R. K., Sanjay Singh

Programming question and answer (Q & A) websites, such as Quora, Stack Overflow, and Yahoo! Answer etc. helps us to understand the programming concepts easily and quickly in a way that has been tested and applied by many software developers. Stack Overflow is one of the most frequently used programming Q\&A website where the questions and answers posted are presently analyzed manually, which requires a huge amount of time and resource. To save the effort, we present a topic modeling based technique to analyze the words of the original texts to discover the themes that run through them. We also propose a method to automate the process of reviewing the quality of questions on Stack Overflow dataset in order to avoid ballooning the stack overflow with insignificant questions. The proposed method also recommends the appropriate tags for the new post, which averts the creation of unnecessary tags on Stack Overflow.

* 11 pages, 7 figures, 3 tables Presented at Third International Symposium on Women in Computing and Informatics (WCI-2015) 

  Access Paper or Ask Questions

Parallel Magnetic Resonance Imaging

Jul 17, 2015
Martin Uecker

The main disadvantage of Magnetic Resonance Imaging (MRI) are its long scan times and, in consequence, its sensitivity to motion. Exploiting the complementary information from multiple receive coils, parallel imaging is able to recover images from under-sampled k-space data and to accelerate the measurement. Because parallel magnetic resonance imaging can be used to accelerate basically any imaging sequence it has many important applications. Parallel imaging brought a fundamental shift in image reconstruction: Image reconstruction changed from a simple direct Fourier transform to the solution of an ill-conditioned inverse problem. This work gives an overview of image reconstruction from the perspective of inverse problems. After introducing basic concepts such as regularization, discretization, and iterative reconstruction, advanced topics are discussed including algorithms for auto-calibration, the connection to approximation theory, and the combination with compressed sensing.

* In: MRI: Physics, Image Reconstruction, and Analysis, CRC Press 2015, pp. 73-92, ISBN 9781482298871 
* 22 pages, 9 Figures, 76 References. Copyright: Martin Uecker. Draft for a book chapter. To appear in: A Majumdar and RK Ward (eds.), MRI: Physics, Image Reconstruction, and Analysis, CRC Press 2015 

  Access Paper or Ask Questions

Rediscovering the Alphabet - On the Innate Universal Grammar

Dec 08, 2014
M. Yahia Kaadan, Asaad Kaadan

Universal Grammar (UG) theory has been one of the most important research topics in linguistics since introduced five decades ago. UG specifies the restricted set of languages learnable by human brain, and thus, many researchers believe in its biological roots. Numerous empirical studies of neurobiological and cognitive functions of the human brain, and of many natural languages, have been conducted to unveil some aspects of UG. This, however, resulted in different and sometimes contradicting theories that do not indicate a universally unique grammar. In this research, we tackle the UG problem from an entirely different perspective. We search for the Unique Universal Grammar (UUG) that facilitates communication and knowledge transfer, the sole purpose of a language. We formulate this UG and show that it is unique, intrinsic, and cosmic, rather than humanistic. Initial analysis on a widespread natural language already showed some positive results.

  Access Paper or Ask Questions

Reading Stockholm Riots 2013 in social media by text-mining

Oct 04, 2013
Andrzej Jarynowski, Amir Rostami

The riots in Stockholm in May 2013 were an event that reverberated in the world media for its dimension of violence that had spread through the Swedish capital. In this study we have investigated the role of social media in creating media phenomena via text mining and natural language processing. We have focused on two channels of communication for our analysis: Twitter and (Forum of Polish community in Sweden). Our preliminary results show some hot topics driving discussion related mostly to Swedish Police and Swedish Politics by counting word usage. Typical features for media intervention are presented. We have built networks of most popular phrases, clustered by categories (geography, media institution, etc.). Sentiment analysis shows negative connotation with Police. The aim of this preliminary exploratory quantitative study was to generate questions and hypotheses, which we could carefully follow by deeper more qualitative methods.

* 5p 

  Access Paper or Ask Questions

Towards a Formal Distributional Semantics: Simulating Logical Calculi with Tensors

Apr 28, 2013
Edward Grefenstette

The development of compositional distributional models of semantics reconciling the empirical aspects of distributional semantics with the compositional aspects of formal semantics is a popular topic in the contemporary literature. This paper seeks to bring this reconciliation one step further by showing how the mathematical constructs commonly used in compositional distributional models, such as tensors and matrices, can be used to simulate different aspects of predicate logic. This paper discusses how the canonical isomorphism between tensors and multilinear maps can be exploited to simulate a full-blown quantifier-free predicate calculus using tensors. It provides tensor interpretations of the set of logical connectives required to model propositional calculi. It suggests a variant of these tensor calculi capable of modelling quantifiers, using few non-linear operations. It finally discusses the relation between these variants, and how this relation should constitute the subject of future work.

* 10 pages, to appear in Proceedings of the Second Joint Conference on Lexical and Computational Semantics. June 2013 

  Access Paper or Ask Questions

The Study of the Application of a Keywords-based Chatbot System on the Teaching of Foreign Languages

Oct 10, 2003
Jiyou Jia

This paper reports the findings of a study conducted on the application of an on-line human-computer dialog system with natural language (chatbot) on the teaching of foreign languages. A keywords-based human-computer dialog system makes it possible that the user could chat with the computer using a natural language, i.e. in English or in German to some extent. So an experiment has been made using this system online to work as a chat partner with the users learning the foreign languages. Dialogs between the users and the chatbot are collected. Findings indicate that the dialogs between the human and the computer are mostly very short because the user finds the responses from the computer are mostly repeated and irrelevant with the topics and context and the program does not understand the language at all. With analysis of the keywords or pattern-matching mechanism used in this chatbot it can be concluded that this kind of system can not work as a teaching assistant program in foreign language learning.

* 11 pages, 2 figures, 10 tables 

  Access Paper or Ask Questions

Text Segmentation Using Exponential Models

Jun 13, 1997
Doug Beeferman, Adam Berger, John Lafferty

This paper introduces a new statistical approach to partitioning text automatically into coherent segments. Our approach enlists both short-range and long-range language models to help it sniff out likely sites of topic changes in text. To aid its search, the system consults a set of simple lexical hints it has learned to associate with the presence of boundaries through inspection of a large corpus of annotated data. We also propose a new probabilistically motivated error metric for use by the natural language processing and information retrieval communities, intended to supersede precision and recall for appraising segmentation algorithms. Qualitative assessment of our algorithm as well as evaluation using this new metric demonstrate the effectiveness of our approach in two very different domains, Wall Street Journal articles and the TDT Corpus, a collection of newswire articles and broadcast news transcripts.

* 12 pages, LaTeX source and postscript figures for EMNLP-2 paper 

  Access Paper or Ask Questions

A Corpus-Based Approach for Building Semantic Lexicons

Jun 10, 1997
Ellen Riloff, Jessica Shepherd

Semantic knowledge can be a great asset to natural language processing systems, but it is usually hand-coded for each application. Although some semantic information is available in general-purpose knowledge bases such as WordNet and Cyc, many applications require domain-specific lexicons that represent words and categories for a particular topic. In this paper, we present a corpus-based method that can be used to build semantic lexicons for specific categories. The input to the system is a small set of seed words for a category and a representative text corpus. The output is a ranked list of words that are associated with the category. A user then reviews the top-ranked words and decides which ones should be entered in the semantic lexicon. In experiments with five categories, users typically found about 60 words per category in 10-15 minutes to build a core semantic lexicon.

* 8 pages - to appear in Proceedings of EMNLP-2 

  Access Paper or Ask Questions

Multi-DNN Accelerators for Next-Generation AI Systems

May 19, 2022
Stylianos I. Venieris, Christos-Savvas Bouganis, Nicholas D. Lane

As the use of AI-powered applications widens across multiple domains, so do increase the computational demands. Primary driver of AI technology are the deep neural networks (DNNs). When focusing either on cloud-based systems that serve multiple AI queries from different users each with their own DNN model, or on mobile robots and smartphones employing pipelines of various models or parallel DNNs for the concurrent processing of multi-modal data, the next generation of AI systems will have multi-DNN workloads at their core. Large-scale deployment of AI services and integration across mobile and embedded systems require additional breakthroughs in the computer architecture front, with processors that can maintain high performance as the number of DNNs increases while meeting the quality-of-service requirements, giving rise to the topic of multi-DNN accelerator design.

* Accepted for publication at the IEEE Computer journal, 2022 

  Access Paper or Ask Questions