Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"speech": models, code, and papers

E-PUR: An Energy-Efficient Processing Unit for Recurrent Neural Networks

Nov 20, 2017
Franyell Silfa, Gem Dot, Jose-Maria Arnau, Antonio Gonzalez

Recurrent Neural Networks (RNNs) are a key technology for emerging applications such as automatic speech recognition, machine translation or image description. Long Short Term Memory (LSTM) networks are the most successful RNN implementation, as they can learn long term dependencies to achieve high accuracy. Unfortunately, the recurrent nature of LSTM networks significantly constrains the amount of parallelism and, hence, multicore CPUs and many-core GPUs exhibit poor efficiency for RNN inference. In this paper, we present E-PUR, an energy-efficient processing unit tailored to the requirements of LSTM computation. The main goal of E-PUR is to support large recurrent neural networks for low-power mobile devices. E-PUR provides an efficient hardware implementation of LSTM networks that is flexible to support diverse applications. One of its main novelties is a technique that we call Maximizing Weight Locality (MWL), which improves the temporal locality of the memory accesses for fetching the synaptic weights, reducing the memory requirements by a large extent. Our experimental results show that E-PUR achieves real-time performance for different LSTM networks, while reducing energy consumption by orders of magnitude with respect to general-purpose processors and GPUs, and it requires a very small chip area. Compared to a modern mobile SoC, an NVIDIA Tegra X1, E-PUR provides an average energy reduction of 92x.


  Access Paper or Ask Questions

KeyXtract Twitter Model - An Essential Keywords Extraction Model for Twitter Designed using NLP Tools

Aug 09, 2017
Tharindu Weerasooriya, Nandula Perera, S. R. Liyanage

Since a tweet is limited to 140 characters, it is ambiguous and difficult for traditional Natural Language Processing (NLP) tools to analyse. This research presents KeyXtract which enhances the machine learning based Stanford CoreNLP Part-of-Speech (POS) tagger with the Twitter model to extract essential keywords from a tweet. The system was developed using rule-based parsers and two corpora. The data for the research was obtained from a Twitter profile of a telecommunication company. The system development consisted of two stages. At the initial stage, a domain specific corpus was compiled after analysing the tweets. The POS tagger extracted the Noun Phrases and Verb Phrases while the parsers removed noise and extracted any other keywords missed by the POS tagger. The system was evaluated using the Turing Test. After it was tested and compared against Stanford CoreNLP, the second stage of the system was developed addressing the shortcomings of the first stage. It was enhanced using Named Entity Recognition and Lemmatization. The second stage was also tested using the Turing test and its pass rate increased from 50.00% to 83.33%. The performance of the final system output was measured using the F1 score. Stanford CoreNLP with the Twitter model had an average F1 of 0.69 while the improved system had a F1 of 0.77. The accuracy of the system could be improved by using a complete domain specific corpus. Since the system used linguistic features of a sentence, it could be applied to other NLP tools.

* 7 Pages, 5 Figures, Proceedings of the 10th KDU International Research Conference 

  Access Paper or Ask Questions

Determining sentiment in citation text and analyzing its impact on the proposed ranking index

Jul 05, 2017
Souvick Ghosh, Dipankar Das, Tanmoy Chakraborty

Whenever human beings interact with each other, they exchange or express opinions, emotions, and sentiments. These opinions can be expressed in text, speech or images. Analysis of these sentiments is one of the popular research areas of present day researchers. Sentiment analysis, also known as opinion mining tries to identify or classify these sentiments or opinions into two broad categories - positive and negative. In recent years, the scientific community has taken a lot of interest in analyzing sentiment in textual data available in various social media platforms. Much work has been done on social media conversations, blog posts, newspaper articles and various narrative texts. However, when it comes to identifying emotions from scientific papers, researchers have faced some difficulties due to the implicit and hidden nature of opinion. By default, citation instances are considered inherently positive in emotion. Popular ranking and indexing paradigms often neglect the opinion present while citing. In this paper, we have tried to achieve three objectives. First, we try to identify the major sentiment in the citation text and assign a score to the instance. We have used a statistical classifier for this purpose. Secondly, we have proposed a new index (we shall refer to it hereafter as M-index) which takes into account both the quantitative and qualitative factors while scoring a paper. Thirdly, we developed a ranking of research papers based on the M-index. We also try to explain how the M-index impacts the ranking of scientific papers.

* Sentiment Analysis, Citation, Citation Sentiment Analysis, Citation Polarity, Ranking, Bibliometrics 

  Access Paper or Ask Questions

QUOTUS: The Structure of Political Media Coverage as Revealed by Quoting Patterns

Apr 06, 2015
Vlad Niculae, Caroline Suen, Justine Zhang, Cristian Danescu-Niculescu-Mizil, Jure Leskovec

Given the extremely large pool of events and stories available, media outlets need to focus on a subset of issues and aspects to convey to their audience. Outlets are often accused of exhibiting a systematic bias in this selection process, with different outlets portraying different versions of reality. However, in the absence of objective measures and empirical evidence, the direction and extent of systematicity remains widely disputed. In this paper we propose a framework based on quoting patterns for quantifying and characterizing the degree to which media outlets exhibit systematic bias. We apply this framework to a massive dataset of news articles spanning the six years of Obama's presidency and all of his speeches, and reveal that a systematic pattern does indeed emerge from the outlet's quoting behavior. Moreover, we show that this pattern can be successfully exploited in an unsupervised prediction setting, to determine which new quotes an outlet will select to broadcast. By encoding bias patterns in a low-rank space we provide an analysis of the structure of political media coverage. This reveals a latent media bias space that aligns surprisingly well with political ideology and outlet type. A linguistic analysis exposes striking differences across these latent dimensions, showing how the different types of media outlets portray different realities even when reporting on the same events. For example, outlets mapped to the mainstream conservative side of the latent space focus on quotes that portray a presidential persona disproportionately characterized by negativity.

* To appear in the Proceedings of WWW 2015. 11pp, 10 fig. Interactive visualization, data, and other info available at http://snap.stanford.edu/quotus/ 

  Access Paper or Ask Questions

Highly comparative time-series analysis: The empirical structure of time series and their methods

Apr 03, 2013
Ben D. Fulcher, Max A. Little, Nick S. Jones

The process of collecting and organizing sets of observations represents a common theme throughout the history of science. However, despite the ubiquity of scientists measuring, recording, and analyzing the dynamics of different processes, an extensive organization of scientific time-series data and analysis methods has never been performed. Addressing this, annotated collections of over 35 000 real-world and model-generated time series and over 9000 time-series analysis algorithms are analyzed in this work. We introduce reduced representations of both time series, in terms of their properties measured by diverse scientific methods, and of time-series analysis methods, in terms of their behaviour on empirical time series, and use them to organize these interdisciplinary resources. This new approach to comparing across diverse scientific data and methods allows us to organize time-series datasets automatically according to their properties, retrieve alternatives to particular analysis methods developed in other scientific disciplines, and automate the selection of useful methods for time-series classification and regression tasks. The broad scientific utility of these tools is demonstrated on datasets of electroencephalograms, self-affine time series, heart beat intervals, speech signals, and others, in each case contributing novel analysis techniques to the existing literature. Highly comparative techniques that compare across an interdisciplinary literature can thus be used to guide more focused research in time-series analysis for applications across the scientific disciplines.

* J. R. Soc. Interface vol. 10 no. 83 20130048 (2013) 

  Access Paper or Ask Questions

Using NLP to analyze whether customer statements comply with their inner belief

Aug 11, 2021
Fabian Thaler, Stefan Faußer, Heiko Gewald

Customers' emotions play a vital role in the service industry. The better frontline personnel understand the customer, the better the service they can provide. As human emotions generate certain (unintentional) bodily reactions, such as increase in heart rate, sweating, dilation, blushing and paling, which are measurable, artificial intelligence (AI) technologies can interpret these signals. Great progress has been made in recent years to automatically detect basic emotions like joy, anger etc. Complex emotions, consisting of multiple interdependent basic emotions, are more difficult to identify. One complex emotion which is of great interest to the service industry is difficult to detect: whether a customer is telling the truth or just a story. This research presents an AI-method for capturing and sensing emotional data. With an accuracy of around 98 %, the best trained model was able to detect whether a participant of a debating challenge was arguing for or against her/his conviction, using speech analysis. The data set was collected in an experimental setting with 40 participants. The findings are applicable to a wide range of service processes and specifically useful for all customer interactions that take place via telephone. The algorithm presented can be applied in any situation where it is helpful for the agent to know whether a customer is speaking to her/his conviction. This could, for example, lead to a reduction in doubtful insurance claims, or untruthful statements in job interviews. This would not only reduce operational losses for service companies, but also encourage customers to be more truthful.

* 26 pages, 2 figures, 4 tables 

  Access Paper or Ask Questions

Put your money where your mouth is: Using AI voice analysis to detect whether spoken arguments reflect the speaker's true convictions

Jul 22, 2021
Fabian Thaler, Stefan Faußer, Heiko Gewald

Customers' emotions play a vital role in the service industry. The better frontline personnel understand the customer, the better the service they can provide. As human emotions generate certain (unintentional) bodily reactions, such as increase in heart rate, sweating, dilation, blushing and paling, which are measurable, artificial intelligence (AI) technologies can interpret these signals. Great progress has been made in recent years to automatically detect basic emotions like joy, anger etc. Complex emotions, consisting of multiple interdependent basic emotions, are more difficult to identify. One complex emotion which is of great interest to the service industry is difficult to detect: whether a customer is telling the truth or just a story. This research presents an AI-method for capturing and sensing emotional data. With an accuracy of around 98 %, the best trained model was able to detect whether a participant of a debating challenge was arguing for or against her/his conviction, using speech analysis. The data set was collected in an experimental setting with 40 participants. The findings are applicable to a wide range of service processes and specifically useful for all customer interactions that take place via telephone. The algorithm presented can be applied in any situation where it is helpful for the agent to know whether a customer is speaking to her/his conviction. This could, for example, lead to a reduction in doubtful insurance claims, or untruthful statements in job interviews. This would not only reduce operational losses for service companies, but also encourage customers to be more truthful.

* 26 pages, 2 figures, 4 tables 

  Access Paper or Ask Questions

LookOut! Interactive Camera Gimbal Controller for Filming Long Takes

Dec 30, 2020
Mohamed Sayed, Robert Cinca, Enrico Costanza, Gabriel Brostow

The job of a camera operator is more challenging, and potentially dangerous, when filming long moving camera shots. Broadly, the operator must keep the actors in-frame while safely navigating around obstacles, and while fulfilling an artistic vision. We propose a unified hardware and software system that distributes some of the camera operator's burden, freeing them up to focus on safety and aesthetics during a take. Our real-time system provides a solo operator with end-to-end control, so they can balance on-set responsiveness to action vs planned storyboards and framing, while looking where they're going. By default, we film without a field monitor. Our LookOut system is built around a lightweight commodity camera gimbal mechanism, with heavy modifications to the controller, which would normally just provide active stabilization. Our control algorithm reacts to speech commands, video, and a pre-made script. Specifically, our automatic monitoring of the live video feed saves the operator from distractions. In pre-production, an artist uses our GUI to design a sequence of high-level camera "behaviors." Those can be specific, based on a storyboard, or looser objectives, such as "frame both actors." Then during filming, a machine-readable script, exported from the GUI, ties together with the sensor readings to drive the gimbal. To validate our algorithm, we compared tracking strategies, interfaces, and hardware protocols, and collected impressions from a) film-makers who used all aspects of our system, and b) film-makers who watched footage filmed using LookOut.

* V2: - Fixed typos. - Cleaner supplemental. - New plot in control section with same data from a supplemental video 

  Access Paper or Ask Questions

Deep multi-metric learning for text-independent speaker verification

Jul 17, 2020
Jiwei Xu, Xinggang Wang, Bin Feng, Wenyu Liu

Text-independent speaker verification is an important artificial intelligence problem that has a wide spectrum of applications, such as criminal investigation, payment certification, and interest-based customer services. The purpose of text-independent speaker verification is to determine whether two given uncontrolled utterances originate from the same speaker or not. Extracting speech features for each speaker using deep neural networks is a promising direction to explore and a straightforward solution is to train the discriminative feature extraction network by using a metric learning loss function. However, a single loss function often has certain limitations. Thus, we use deep multi-metric learning to address the problem and introduce three different losses for this problem, i.e., triplet loss, n-pair loss and angular loss. The three loss functions work in a cooperative way to train a feature extraction network equipped with Residual connections and squeeze-and-excitation attention. We conduct experiments on the large-scale \texttt{VoxCeleb2} dataset, which contains over a million utterances from over $6,000$ speakers, and the proposed deep neural network obtains an equal error rate of $3.48\%$, which is a very competitive result. Codes for both training and testing and pretrained models are available at \url{https://github.com/GreatJiweix/DmmlTiSV}, which is the first publicly available code repository for large-scale text-independent speaker verification with performance on par with the state-of-the-art systems.

* Neurocomputing, Volume 410, 14 October 2020, Pages 394-400 

  Access Paper or Ask Questions

<<
811
812
813
814
815
816
817
818
819
820
821
822
823
>>