Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"speech": models, code, and papers

Analyzing Verbal and Nonverbal Features for Predicting Group Performance

Jul 03, 2019
Uliyana Kubasova, Gabriel Murray, McKenzie Braley

This work analyzes the efficacy of verbal and nonverbal features of group conversation for the task of automatic prediction of group task performance. We describe a new publicly available survival task dataset that was collected and annotated to facilitate this prediction task. In these experiments, the new dataset is merged with an existing survival task dataset, allowing us to compare feature sets on a much larger amount of data than has been used in recent related work. This work is also distinct from related research on social signal processing (SSP) in that we compare verbal and nonverbal features, whereas SSP is almost exclusively concerned with nonverbal aspects of social interaction. A key finding is that nonverbal features from the speech signal are extremely effective for this task, even on their own. However, the most effective individual features are verbal features, and we highlight the most important ones.

* Accepted to INTERSPEECH 2019 (Graz, Austria) 

  Access Paper or Ask Questions

Gated recurrent units viewed through the lens of continuous time dynamical systems

Jun 03, 2019
Ian D. Jordan, Piotr Aleksander Sokol, Il Memming Park

Gated recurrent units (GRUs) are specialized memory elements for building recurrent neural networks. Despite their incredible success in natural language, speech, and video processing, little is understood about the specific dynamics representable in a GRU network, along with the constraints these dynamics impose when generalizing a specific task. As a result, it is difficult to know a priori how successful a GRU network will perform on a given task. Using a continuous time analysis, we gain intuition on the inner workings of GRU networks. We restrict our presentation to low dimensions to allow for a comprehensive visualization. We found a surprisingly rich repertoire of dynamical features that includes stable limit cycles (nonlinear oscillations), multi-stable dynamics with various topologies, and homoclinic orbits. We contextualize the usefulness of the different kinds of dynamics and experimentally test their existence.

  Access Paper or Ask Questions

Accelerating Minibatch Stochastic Gradient Descent using Typicality Sampling

Mar 11, 2019
Xinyu Peng, Li Li, Fei-Yue Wang

Machine learning, especially deep neural networks, has been rapidly developed in fields including computer vision, speech recognition and reinforcement learning. Although Mini-batch SGD is one of the most popular stochastic optimization methods in training deep networks, it shows a slow convergence rate due to the large noise in gradient approximation. In this paper, we attempt to remedy this problem by building more efficient batch selection method based on typicality sampling, which reduces the error of gradient estimation in conventional Minibatch SGD. We analyze the convergence rate of the resulting typical batch SGD algorithm and compare convergence properties between Minibatch SGD and the algorithm. Experimental results demonstrate that our batch selection scheme works well and more complex Minibatch SGD variants can benefit from the proposed batch selection strategy.

* 10 pages, 4 figures, for journal 

  Access Paper or Ask Questions

Predicting tongue motion in unlabeled ultrasound videos using convolutional LSTM neural network

Feb 19, 2019
Chaojie Zhao, Peng Zhang, Jian Zhu, Chengrui Wu, Huaimin Wang, Kele Xu

A challenge in speech production research is to predict future tongue movements based on a short period of past tongue movements. This study tackles speaker-dependent tongue motion prediction problem in unlabeled ultrasound videos with convolutional long short-term memory (ConvLSTM) networks. The model has been tested on two different ultrasound corpora. ConvLSTM outperforms 3-dimensional convolutional neural network (3DCNN) in predicting the 9\textsuperscript{th} frames based on 8 preceding frames, and also demonstrates good capacity to predict only the tongue contours in future frames. Further tests reveal that ConvLSTM can also learn to predict tongue movements in more distant frames beyond the immediately following frames. Our codes are available at:

* Accepted by ICASSP 2019 

  Access Paper or Ask Questions

Progressive Label Distillation: Learning Input-Efficient Deep Neural Networks

Jan 26, 2019
Zhong Qiu Lin, Alexander Wong

Much of the focus in the area of knowledge distillation has been on distilling knowledge from a larger teacher network to a smaller student network. However, there has been little research on how the concept of distillation can be leveraged to distill the knowledge encapsulated in the training data itself into a reduced form. In this study, we explore the concept of progressive label distillation, where we leverage a series of teacher-student network pairs to progressively generate distilled training data for learning deep neural networks with greatly reduced input dimensions. To investigate the efficacy of the proposed progressive label distillation approach, we experimented with learning a deep limited vocabulary speech recognition network based on generated 500ms input utterances distilled progressively from 1000ms source training data, and demonstrated a significant increase in test accuracy of almost 78% compared to direct learning.

* 9 pages 

  Access Paper or Ask Questions

Towards Dialogue-based Navigation with Multivariate Adaptation driven by Intention and Politeness for Social Robots

Sep 19, 2018
Chandrakant Bothe, Fernando Garcia, Arturo Cruz Maya, Amit Kumar Pandey, Stefan Wermter

Service robots need to show appropriate social behavior in order to deploy in social environments such as healthcare, education, retail, etc. Some of the main capabilities that robots should have are navigation and conversational skill. If the person is impatient, he might want a robot to navigate faster and vice versa. Linguistic features that derive politeness can provide social cues about person's patient and impatient behavior. The novelty presented in this paper is to dynamically incorporate politeness in robotic dialogue systems for navigation. Understanding the politeness in users' speech can be used to modulate the robot behavior and responses. Therefore, we developed a dialogue system to navigate in an indoor environment, which produces different robot behaviors and responses based on users' intention and degree of politeness. We deploy and test our system with the Pepper robot that adapts to the changes in user's politeness.

* Submitted to ICSR 2018 

  Access Paper or Ask Questions

Review of Deep Learning

Aug 28, 2018
Rong Zhang, Weiping Li, Tong Mo

In recent years, China, the United States and other countries, Google and other high-tech companies have increased investment in artificial intelligence. Deep learning is one of the current artificial intelligence research's key areas. This paper analyzes and summarizes the latest progress and future research directions of deep learning. Firstly, three basic models of deep learning are outlined, including multilayer perceptrons, convolutional neural networks, and recurrent neural networks. On this basis, we further analyze the emerging new models of convolution neural networks and recurrent neural networks. This paper then summarizes deep learning's applications in many areas of artificial intelligence, including speech processing, computer vision, natural language processing and so on. Finally, this paper discusses the existing problems of deep learning and gives the corresponding possible solutions.

* In Chinese. Have been published in the journal "Information and Control" 

  Access Paper or Ask Questions

Improving DNN Robustness to Adversarial Attacks using Jacobian Regularization

Aug 26, 2018
Daniel Jakubovitz, Raja Giryes

Deep neural networks have lately shown tremendous performance in various applications including vision and speech processing tasks. However, alongside their ability to perform these tasks with such high accuracy, it has been shown that they are highly susceptible to adversarial attacks: a small change in the input would cause the network to err with high confidence. This phenomenon exposes an inherent fault in these networks and their ability to generalize well. For this reason, providing robustness to adversarial attacks is an important challenge in networks training, which has led to extensive research. In this work, we suggest a theoretically inspired novel approach to improve the networks' robustness. Our method applies regularization using the Frobenius norm of the Jacobian of the network, which is applied as post-processing, after regular training has finished. We demonstrate empirically that it leads to enhanced robustness results with a minimal change in the original network's accuracy.

* ECCV 2018 Conference Paper 

  Access Paper or Ask Questions

Automatic Normalization of Word Variations in Code-Mixed Social Media Text

Apr 03, 2018
Rajat Singh, Nurendra Choudhary, Manish Shrivastava

Social media platforms such as Twitter and Facebook are becoming popular in multilingual societies. This trend induces portmanteau of South Asian languages with English. The blend of multiple languages as code-mixed data has recently become popular in research communities for various NLP tasks. Code-mixed data consist of anomalies such as grammatical errors and spelling variations. In this paper, we leverage the contextual property of words where the different spelling variation of words share similar context in a large noisy social media text. We capture different variations of words belonging to same context in an unsupervised manner using distributed representations of words. Our experiments reveal that preprocessing of the code-mixed dataset based on our approach improves the performance in state-of-the-art part-of-speech tagging (POS-tagging) and sentiment analysis tasks.

* Accepted Long Paper at 19th International Conference on Computational Linguistics and Intelligent Text Processing, March 2018, Hanoi, Vietnam 

  Access Paper or Ask Questions