Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

"President Vows to Cut <Taxes> Hair": Dataset and Analysis of Creative Text Editing for Humorous Headlines

Jun 01, 2019
Nabil Hossain, John Krumm, Michael Gamon

We introduce, release, and analyze a new dataset, called Humicroedit, for research in computational humor. Our publicly available data consists of regular English news headlines paired with versions of the same headlines that contain simple replacement edits designed to make them funny. We carefully curated crowdsourced editors to create funny headlines and judges to score a to a total of 15,095 edited headlines, with five judges per headline. The simple edits, usually just a single word replacement, mean we can apply straightforward analysis techniques to determine what makes our edited headlines humorous. We show how the data support classic theories of humor, such as incongruity, superiority, and setup/punchline. Finally, we develop baseline classifiers that can predict whether or not an edited headline is funny, which is a first step toward automatically generating humorous headlines as an approach to creating topical humor.

* Accepted in NAACL 2019 

  Access Paper or Ask Questions

A Comparative Analysis of Distributional Term Representations for Author Profiling in Social Media

May 21, 2019
Miguel Á. Álvarez-Carmona, Esaú Villatoro-Tello, Manuel Montes-y-Gómez, Luis Villaseñor-Pienda

Author Profiling (AP) aims at predicting specific characteristics from a group of authors by analyzing their written documents. Many research has been focused on determining suitable features for modeling writing patterns from authors. Reported results indicate that content-based features continue to be the most relevant and discriminant features for solving this task. Thus, in this paper, we present a thorough analysis regarding the appropriateness of different distributional term representations (DTR) for the AP task. In this regard, we introduce a novel framework for supervised AP using these representations and, supported on it. We approach a comparative analysis of representations such as DOR, TCOR, SSR, and word2vec in the AP problem. We also compare the performance of the DTRs against classic approaches including popular topic-based methods. The obtained results indicate that DTRs are suitable for solving the AP task in social media domains as they achieve competitive results while providing meaningful interpretability.

* Journal of Intelligent & Fuzzy Systems, vol. 36, no. 5, pp. 4857-4868, 2019 

  Access Paper or Ask Questions

PL-NMF: Parallel Locality-Optimized Non-negative Matrix Factorization

Apr 16, 2019
Gordon E. Moon, Aravind Sukumaran-Rajam, Srinivasan Parthasarathy, P. Sadayappan

Non-negative Matrix Factorization (NMF) is a key kernel for unsupervised dimension reduction used in a wide range of applications, including topic modeling, recommender systems and bioinformatics. Due to the compute-intensive nature of applications that must perform repeated NMF, several parallel implementations have been developed in the past. However, existing parallel NMF algorithms have not addressed data locality optimizations, which are critical for high performance since data movement costs greatly exceed the cost of arithmetic/logic operations on current computer systems. In this paper, we devise a parallel NMF algorithm based on the HALS (Hierarchical Alternating Least Squares) scheme that incorporates algorithmic transformations to enhance data locality. Efficient realizations of the algorithm on multi-core CPUs and GPUs are developed, demonstrating significant performance improvement over existing state-of-the-art parallel NMF algorithms.

* 11 pages, 5 tables, 9 figures 

  Access Paper or Ask Questions

A Two-Step Pursuit-Evasion Algorithm for Autonomous Underwater Vehicles

Feb 22, 2019
Özer Özkahraman, Petter Ögren

In this paper, we consider the problem of pursuit-evasion using multiple Autonomous Underwater Vehicles (AUVs) in a 3D water volume, with and without obstacles in terms of islands and the seabed topography. Pursuit-evasion is a well studied topic in robotics, but the results are mostly set in 2D environments, using unlimited line-of-sight sensing. We propose an algorithm for range-limited sensing in 3D environments that captures a finite-speed evader based on a single previous observation of its location. The pursuers are first moved to form a cage formation that contains the evader while minimizing the number of pursuers required. Upon completion of the initial cage, the cage is then changed to a smaller spherical cage that is shrunk until every part of the volume containing the evader is sensed, capturing the evader. The pursuers only need minimal communication and computation while the mission is carried out and most of the computation is done beforehand, allowing for easy implementation.

* 8 pages, submitted to RA-L 2019 

  Access Paper or Ask Questions

Measuring Issue Ownership using Word Embeddings

Oct 31, 2018
Amaru Cuba Gyllensten, Magnus Sahlgren

Sentiment and topic analysis are common methods used for social media monitoring. Essentially, these methods answers questions such as, "what is being talked about, regarding X", and "what do people feel, regarding X". In this paper, we investigate another venue for social media monitoring, namely issue ownership and agenda setting, which are concepts from political science that have been used to explain voter choice and electoral outcomes. We argue that issue alignment and agenda setting can be seen as a kind of semantic source similarity of the kind "how similar is source A to issue owner P, when talking about issue X", and as such can be measured using word/document embedding techniques. We present work in progress towards measuring that kind of conditioned similarity, and introduce a new notion of similarity for predictive embeddings. We then test this method by measuring the similarity between politically aligned media and political parties, conditioned on bloc-specific issues.

* Accepted to the 9th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA), held in conjunction with the EMNLP 2018 conference 

  Access Paper or Ask Questions

WiSeBE: Window-based Sentence Boundary Evaluation

Aug 27, 2018
Carlos-Emiliano González-Gallardo, Juan-Manuel Torres-Moreno

Sentence Boundary Detection (SBD) has been a major research topic since Automatic Speech Recognition transcripts have been used for further Natural Language Processing tasks like Part of Speech Tagging, Question Answering or Automatic Summarization. But what about evaluation? Do standard evaluation metrics like precision, recall, F-score or classification error; and more important, evaluating an automatic system against a unique reference is enough to conclude how well a SBD system is performing given the final application of the transcript? In this paper we propose Window-based Sentence Boundary Evaluation (WiSeBE), a semi-supervised metric for evaluating Sentence Boundary Detection systems based on multi-reference (dis)agreement. We evaluate and compare the performance of different SBD systems over a set of Youtube transcripts using WiSeBE and standard metrics. This double evaluation gives an understanding of how WiSeBE is a more reliable metric for the SBD task.

* In proceedings of the 17th Mexican International Conference on Artificial Intelligence (MICAI), 2018 

  Access Paper or Ask Questions

Accurate 3D Localization for MAV Swarms by UWB and IMU Fusion

Jul 28, 2018
Jiaxin Li, Yingcai Bi, Kun Li, Kangli Wang, Feng Lin, Ben M. Chen

Driven by applications like Micro Aerial Vehicles (MAVs), driver-less cars, etc, localization solution has become an active research topic in the past decade. In recent years, Ultra Wideband (UWB) emerged as a promising technology because of its impressive performance in both indoor and outdoor positioning. But algorithms relying only on UWB sensor usually result in high latency and low bandwidth, which is undesirable in some situations such as controlling a MAV. To alleviate this problem, an Extended Kalman Filter (EKF) based algorithm is proposed to fuse the Inertial Measurement Unit (IMU) and UWB, which achieved 80Hz 3D localization with significantly improved accuracy and almost no delay. To verify the effectiveness and reliability of the proposed approach, a swarm of 6 MAVs is set up to perform a light show in an indoor exhibition hall. Video and source codes are available at

* ICCA 2018 (The 14th IEEE International Conference on Control and Automation) 

  Access Paper or Ask Questions

On the computational analysis of the genetic algorithm for attitude control of a carrier system

Jun 27, 2018
Hadi Jahanshahi, Naeimeh Najafizadeh Sari

This paper intends to cover three main topics. First, a fuzzy-PID controller is designed to control the thrust vector of a launch vehicle, accommodating a CanSat. Then, the genetic algorithm (GA) is employed to optimize the controller performance. Finally, through adjusting the algorithm parameters, their impact on the optimization process is examined. In this regard, the motion vector control is programmed based on the governing dynamic equations of motion for payload delivery in the desired altitude and flight-path angle. This utilizes one single input and one preferential fuzzy inference engine, where the latter acts to avoid the system instability in large angles for the thrust vector. The optimization objective functions include the deviations of the thrust vector and the system from the equilibrium state, which must be met simultaneously. Sensitivity analysis of the parameters of the genetic algorithm involves examining nine different cases and discussing their impact on the optimization results.

* 14 pages, 12 figures 

  Access Paper or Ask Questions

LivDet 2017 Fingerprint Liveness Detection Competition 2017

Mar 14, 2018
Valerio Mura, Giulia Orrù, Roberto Casula, Alessandra Sibiriu, Giulia Loi, Pierluigi Tuveri, Luca Ghiani, Gian Luca Marcialis

Fingerprint Presentation Attack Detection (FPAD) deals with distinguishing images coming from artificial replicas of the fingerprint characteristic, made up of materials like silicone, gelatine or latex, and images coming from alive fingerprints. Images are captured by modern scanners, typically relying on solid-state or optical technologies. Since from 2009, the Fingerprint Liveness Detection Competition (LivDet) aims to assess the performance of the state-of-the-art algorithms according to a rigorous experimental protocol and, at the same time, a simple overview of the basic achievements. The competition is open to all academics research centers and all companies that work in this field. The positive, increasing trend of the participants number, which supports the success of this initiative, is confirmed even this year: 17 algorithms were submitted to the competition, with a larger involvement of companies and academies. This means that the topic is relevant for both sides, and points out that a lot of work must be done in terms of fundamental and applied research.

* presented at ICB 2018 

  Access Paper or Ask Questions

Persistent homology machine learning for fingerprint classification

Nov 24, 2017
Noah Giansiracusa, Robert Giansiracusa, Chul Moon

The fingerprint classification problem is to sort fingerprints into pre-determined groups, such as arch, loop, and whorl. It was asserted in the literature that minutiae points, which are commonly used for fingerprint matching, are not useful for classification. We show that, to the contrary, near state-of-the-art classification accuracy rates can be achieved when applying topological data analysis (TDA) to 3-dimensional point clouds of oriented minutiae points. We also apply TDA to fingerprint ink-roll images, which yields a lower accuracy rate but still shows promise, particularly since the only preprocessing is cropping; moreover, combining the two approaches outperforms each one individually. These methods use supervised learning applied to persistent homology and allow us to explore feature selection on barcodes, an important topic at the interface between TDA and machine learning. We test our classification algorithms on the NIST fingerprint database SD-27.

* 15 pages 

  Access Paper or Ask Questions