Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jeonghee Kim

Hadamard Product for Low-rank Bilinear Pooling

Mar 26, 2017

Jin-Hwa Kim, Kyoung-Woon On, Woosang Lim, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang

Figure 1 for Hadamard Product for Low-rank Bilinear Pooling

Figure 2 for Hadamard Product for Low-rank Bilinear Pooling

Figure 3 for Hadamard Product for Low-rank Bilinear Pooling

Figure 4 for Hadamard Product for Low-rank Bilinear Pooling

Abstract:Bilinear models provide rich representations compared with linear models. They have been applied in various visual tasks, such as object recognition, segmentation, and visual question-answering, to get state-of-the-art performances taking advantage of the expanded representations. However, bilinear representations tend to be high-dimensional, limiting the applicability to computationally complex tasks. We propose low-rank bilinear pooling using Hadamard product for an efficient attention mechanism of multimodal learning. We show that our model outperforms compact bilinear pooling in visual question-answering tasks with the state-of-the-art results on the VQA dataset, having a better parsimonious property.

* 13 pages, 1 figure, & appendix. ICLR 2017 accepted

Via

Access Paper or Ask Questions

Dual Attention Networks for Multimodal Reasoning and Matching

Mar 21, 2017

Hyeonseob Nam, Jung-Woo Ha, Jeonghee Kim

Figure 1 for Dual Attention Networks for Multimodal Reasoning and Matching

Figure 2 for Dual Attention Networks for Multimodal Reasoning and Matching

Figure 3 for Dual Attention Networks for Multimodal Reasoning and Matching

Figure 4 for Dual Attention Networks for Multimodal Reasoning and Matching

Abstract:We propose Dual Attention Networks (DANs) which jointly leverage visual and textual attention mechanisms to capture fine-grained interplay between vision and language. DANs attend to specific regions in images and words in text through multiple steps and gather essential information from both modalities. Based on this framework, we introduce two types of DANs for multimodal reasoning and matching, respectively. The reasoning model allows visual and textual attentions to steer each other during collaborative inference, which is useful for tasks such as Visual Question Answering (VQA). In addition, the matching model exploits the two attention mechanisms to estimate the similarity between images and sentences by focusing on their shared semantics. Our extensive experiments validate the effectiveness of DANs in combining vision and language, achieving the state-of-the-art performance on public benchmarks for VQA and image-text matching.

Via

Access Paper or Ask Questions

Multimodal Residual Learning for Visual QA

Aug 31, 2016

Jin-Hwa Kim, Sang-Woo Lee, Dong-Hyun Kwak, Min-Oh Heo, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang

Figure 1 for Multimodal Residual Learning for Visual QA

Figure 2 for Multimodal Residual Learning for Visual QA

Figure 3 for Multimodal Residual Learning for Visual QA

Figure 4 for Multimodal Residual Learning for Visual QA

Abstract:Deep neural networks continue to advance the state-of-the-art of image recognition tasks with various methods. However, applications of these methods to multimodality remain limited. We present Multimodal Residual Networks (MRN) for the multimodal residual learning of visual question-answering, which extends the idea of the deep residual learning. Unlike the deep residual learning, MRN effectively learns the joint representation from vision and language information. The main idea is to use element-wise multiplication for the joint residual mappings exploiting the residual learning of the attentional models in recent studies. Various alternative models introduced by multimodality are explored based on our study. We achieve the state-of-the-art results on the Visual QA dataset for both Open-Ended and Multiple-Choice tasks. Moreover, we introduce a novel method to visualize the attention effect of the joint representations for each learning block using back-propagation algorithm, even though the visual features are collapsed without spatial information.

* 13 pages, 7 figures, accepted for NIPS 2016

Via

Access Paper or Ask Questions

Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy

Jun 15, 2015

Sang-Woo Lee, Min-Oh Heo, Jiwon Kim, Jeonghee Kim, Byoung-Tak Zhang

Figure 1 for Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy

Figure 2 for Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy

Figure 3 for Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy

Figure 4 for Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy

Abstract:The online learning of deep neural networks is an interesting problem of machine learning because, for example, major IT companies want to manage the information of the massive data uploaded on the web daily, and this technology can contribute to the next generation of lifelong learning. We aim to train deep models from new data that consists of new classes, distributions, and tasks at minimal computational cost, which we call online deep learning. Unfortunately, deep neural network learning through classical online and incremental methods does not work well in both theory and practice. In this paper, we introduce dual memory architectures for online incremental deep learning. The proposed architecture consists of deep representation learners and fast learnable shallow kernel networks, both of which synergize to track the information of new data. During the training phase, we use various online, incremental ensemble, and transfer learning techniques in order to achieve lower error of the architecture. On the MNIST, CIFAR-10, and ImageNet image recognition tasks, the proposed dual memory architectures performs much better than the classical online and incremental ensemble algorithm, and their accuracies are similar to that of the batch learner.

Via

Access Paper or Ask Questions