Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Juan Jose Alvarado Leanos

Context, Attention and Audio Feature Explorations for Audio Visual Scene-Aware Dialog

Dec 20, 2018

Shachi H Kumar, Eda Okur, Saurav Sahay, Juan Jose Alvarado Leanos, Jonathan Huang, Lama Nachman

Figure 1 for Context, Attention and Audio Feature Explorations for Audio Visual Scene-Aware Dialog

Figure 2 for Context, Attention and Audio Feature Explorations for Audio Visual Scene-Aware Dialog

Figure 3 for Context, Attention and Audio Feature Explorations for Audio Visual Scene-Aware Dialog

Figure 4 for Context, Attention and Audio Feature Explorations for Audio Visual Scene-Aware Dialog

Abstract:With the recent advancements in AI, Intelligent Virtual Assistants (IVA) have become a ubiquitous part of every home. Going forward, we are witnessing a confluence of vision, speech and dialog system technologies that are enabling the IVAs to learn audio-visual groundings of utterances and have conversations with users about the objects, activities and events surrounding them. As a part of the 7th Dialog System Technology Challenges (DSTC7), for Audio Visual Scene-Aware Dialog (AVSD) track, We explore `topics' of the dialog as an important contextual feature into the architecture along with explorations around multimodal Attention. We also incorporate an end-to-end audio classification ConvNet, AclNet, into our models. We present detailed analysis of the experiments and show that some of our model variations outperform the baseline system presented for this task.

* 7 pages, 2 figures, DSTC7 workshop at AAAI 2019

Via

Access Paper or Ask Questions

AclNet: efficient end-to-end audio classification CNN

Nov 16, 2018

Jonathan J Huang, Juan Jose Alvarado Leanos

Figure 1 for AclNet: efficient end-to-end audio classification CNN

Figure 2 for AclNet: efficient end-to-end audio classification CNN

Figure 3 for AclNet: efficient end-to-end audio classification CNN

Figure 4 for AclNet: efficient end-to-end audio classification CNN

Abstract:We propose an efficient end-to-end convolutional neural network architecture, AclNet, for audio classification. When trained with our data augmentation and regularization, we achieved state-of-the-art performance on the ESC-50 corpus with 85:65% accuracy. Our network allows configurations such that memory and compute requirements are drastically reduced, and a tradeoff analysis of accuracy and complexity is presented. The analysis shows high accuracy at significantly reduced computational complexity compared to existing solutions. For example, a configuration with only 155k parameters and 49:3 million multiply-adds per second is 81:75%, exceeding human accuracy of 81:3%. This improved efficiency can enable always-on inference in energy-efficient platforms.

Via

Access Paper or Ask Questions