Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Fast and Robust Dynamic Hand Gesture Recognition via Key Frames Extraction and Feature Fusion

Jan 15, 2019
Hao Tang, Hong Liu, Wei Xiao, Nicu Sebe

Gesture recognition is a hot topic in computer vision and pattern recognition, which plays a vitally important role in natural human-computer interface. Although great progress has been made recently, fast and robust hand gesture recognition remains an open problem, since the existing methods have not well balanced the performance and the efficiency simultaneously. To bridge it, this work combines image entropy and density clustering to exploit the key frames from hand gesture video for further feature extraction, which can improve the efficiency of recognition. Moreover, a feature fusion strategy is also proposed to further improve feature representation, which elevates the performance of recognition. To validate our approach in a "wild" environment, we also introduce two new datasets called HandGesture and Action3D datasets. Experiments consistently demonstrate that our strategy achieves competitive results on Northwestern University, Cambridge, HandGesture and Action3D hand gesture datasets. Our code and datasets will release at

* 11 pages, 3 figures, accepted to NeuroComputing 

  Access Paper or Ask Questions

The Logoscope: a Semi-Automatic Tool for Detecting and Documenting French New Words

Oct 25, 2018
Ingrid Falk, Delphine Bernhard, Christophe Gérard

In this article we present the design and implementation of the Logoscope, the first tool especially developed to detect new words of the French language, to document them and allow a public access through a web interface. This semi-automatic tool collects new words daily by browsing the online versions of French well known newspapers such as Le Monde, Le Figaro, L'Equipe, Lib\'eration, La Croix, Les \'Echos. In contrast to other existing tools essentially dedicated to dictionary development, the Logoscope attempts to give a more complete account of the context in which the new words occur. In addition to the commonly given morpho-syntactic information it also provides information about the textual and discursive contexts of the word creation; in particular, it automatically determines the (journalistic) topics of the text containing the new word. In this article we first give a general overview of the developed tool. We then describe the approach taken, we discuss the linguistic background which guided our design decisions and present the computational methods we used to implement it.

* Project report, 28 pages, 10 figures 

  Access Paper or Ask Questions

Explanation in Artificial Intelligence: Insights from the Social Sciences

Aug 15, 2018
Tim Miller

There has been a recent resurgence in the area of explainable artificial intelligence as researchers and practitioners seek to make their algorithms more understandable. Much of this research is focused on explicitly explaining decisions or actions to a human observer, and it should not be controversial to say that looking at how humans explain to each other can serve as a useful starting point for explanation in artificial intelligence. However, it is fair to say that most work in explainable artificial intelligence uses only the researchers' intuition of what constitutes a `good' explanation. There exists vast and valuable bodies of research in philosophy, psychology, and cognitive science of how people define, generate, select, evaluate, and present explanations, which argues that people employ certain cognitive biases and social expectations towards the explanation process. This paper argues that the field of explainable artificial intelligence should build on this existing research, and reviews relevant papers from philosophy, cognitive psychology/science, and social psychology, which study these topics. It draws out some important findings, and discusses ways that these can be infused with work on explainable artificial intelligence.

  Access Paper or Ask Questions

Adaptive System Identification Using LMS Algorithm Integrated with Evolutionary Computation

Jul 17, 2018
Ibraheem Kasim Ibraheem

System identification is an exceptionally expansive topic and of remarkable significance in the discipline of signal processing and communication. Our goal in this paper is to show how simple adaptive FIR and IIR filters can be used in system modeling and demonstrating the application of adaptive system identification. The main objective of our research is to study the LMS algorithm and its improvement by the genetic search approach, namely, LMS-GA, to search the multi-modal error surface of the IIR filter to avoid local minima and finding the optimal weight vector when only measured or estimated data are available. Convergence analysis of the LMS algorithm in the case of coloured input signal, i.e., correlated input signal is demonstrated on adaptive FIR filter via power spectral density of the input signals and Fourier transform of the autocorrelation matrix of the input signal. Simulations have been carried out on adaptive filtering of FIR and IIR filters and tested on white and coloured input signals to validate the powerfulness of the genetic-based LMS algorithm.

  Access Paper or Ask Questions

Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints

May 17, 2018
Alexander Richard, Hilde Kuehne, Juergen Gall

Action detection and temporal segmentation of actions in videos are topics of increasing interest. While fully supervised systems have gained much attention lately, full annotation of each action within the video is costly and impractical for large amounts of video data. Thus, weakly supervised action detection and temporal segmentation methods are of great importance. While most works in this area assume an ordered sequence of occurring actions to be given, our approach only uses a set of actions. Such action sets provide much less supervision since neither action ordering nor the number of action occurrences are known. In exchange, they can be easily obtained, for instance, from meta-tags, while ordered sequences still require human annotation. We introduce a system that automatically learns to temporally segment and label actions in a video, where the only supervision that is used are action sets. An evaluation on three datasets shows that our method still achieves good results although the amount of supervision is significantly smaller than for other related methods.

* CVPR 2018 

  Access Paper or Ask Questions

Detecting Syntactic Features of Translated Chinese

Apr 23, 2018
Hai Hu, Wen Li, Sandra Kübler

We present a machine learning approach to distinguish texts translated to Chinese (by humans) from texts originally written in Chinese, with a focus on a wide range of syntactic features. Using Support Vector Machines (SVMs) as classifier on a genre-balanced corpus in translation studies of Chinese, we find that constituent parse trees and dependency triples as features without lexical information perform very well on the task, with an F-measure above 90%, close to the results of lexical n-gram features, without the risk of learning topic information rather than translation features. Thus, we claim syntactic features alone can accurately distinguish translated from original Chinese. Translated Chinese exhibits an increased use of determiners, subject position pronouns, NP + 'de' as NP modifiers, multiple NPs or VPs conjoined by a Chinese specific punctuation, among other structures. We also interpret the syntactic features with reference to previous translation studies in Chinese, particularly the usage of pronouns.

* Accepted to 2nd Workshop on Stylistic Variation, NAACL 2018 

  Access Paper or Ask Questions

An Event Detection Approach Based On Twitter Hashtags

Apr 02, 2018
Shih-Feng Yang, Julia Taylor Rayz

Twitter is one of the most popular microblogging services in the world. The great amount of information within Twitter makes it an important information channel for people to learn and share news. Twitter hashtag is an popular feature that can be viewed as human-labeled information which people use to identify the topic of a tweet. Many researchers have proposed event-detection approaches that can monitor Twitter data and determine whether special events, such as accidents, extreme weather, earthquakes, or crimes take place. Although many approaches use hashtags as one of their features, few of them explicitly focus on the effectiveness of using hashtags on event detection. In this study, we proposed an event detection approach that utilizes hashtags in tweets. We adopted the feature extraction used in STREAMCUBE and applied a clustering K-means approach to it. The experiments demonstrated that the K-means approach performed better than STREAMCUBE in the clustering results. A discussion on optimal K values for the K-means approach is also provided.

* The 18th International Conference on Computational Linguistics and Intelligent Text Processing, 2017 

  Access Paper or Ask Questions

Critical Percolation as a Framework to Analyze the Training of Deep Networks

Feb 06, 2018
Zohar Ringel, Rodrigo de Bem

In this paper we approach two relevant deep learning topics: i) tackling of graph structured input data and ii) a better understanding and analysis of deep networks and related learning algorithms. With this in mind we focus on the topological classification of reachability in a particular subset of planar graphs (Mazes). Doing so, we are able to model the topology of data while staying in Euclidean space, thus allowing its processing with standard CNN architectures. We suggest a suitable architecture for this problem and show that it can express a perfect solution to the classification task. The shape of the cost function around this solution is also derived and, remarkably, does not depend on the size of the maze in the large maze limit. Responsible for this behavior are rare events in the dataset which strongly regulate the shape of the cost function near this global minimum. We further identify an obstacle to learning in the form of poorly performing local minima in which the network chooses to ignore some of the inputs. We further support our claims with training experiments and numerical analysis of the cost function on networks with up to $128$ layers.

* Accepted to ICLR 2018 as a conference paper 

  Access Paper or Ask Questions

Spatiotemporal Recurrent Convolutional Networks for Traffic Prediction in Transportation Networks

May 07, 2017
Haiyang Yu, Zhihai Wu, Shuqin Wang, Yunpeng Wang, Xiaolei Ma

Predicting large-scale transportation network traffic has become an important and challenging topic in recent decades. Inspired by the domain knowledge of motion prediction, in which the future motion of an object can be predicted based on previous scenes, we propose a network grid representation method that can retain the fine-scale structure of a transportation network. Network-wide traffic speeds are converted into a series of static images and input into a novel deep architecture, namely, spatiotemporal recurrent convolutional networks (SRCNs), for traffic forecasting. The proposed SRCNs inherit the advantages of deep convolutional neural networks (DCNNs) and long short-term memory (LSTM) neural networks. The spatial dependencies of network-wide traffic can be captured by DCNNs, and the temporal dynamics can be learned by LSTMs. An experiment on a Beijing transportation network with 278 links demonstrates that SRCNs outperform other deep learning-based algorithms in both short-term and long-term traffic prediction.

  Access Paper or Ask Questions

End-to-end optimization of goal-driven and visually grounded dialogue systems

Mar 15, 2017
Florian Strub, Harm de Vries, Jeremie Mary, Bilal Piot, Aaron Courville, Olivier Pietquin

End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision is too simplistic to render the intrinsic planning problem inherent to dialogue as well as its grounded nature, making the context of a dialogue larger than the sole history. This is why only chit-chat and question answering tasks have been addressed so far using end-to-end architectures. In this paper, we introduce a Deep Reinforcement Learning method to optimize visually grounded task-oriented dialogues, based on the policy gradient algorithm. This approach is tested on a dataset of 120k dialogues collected through Mechanical Turk and provides encouraging results at solving both the problem of generating natural dialogues and the task of discovering a specific object in a complex picture.

  Access Paper or Ask Questions