Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Changhu Wang

Towards Good Practices for Multi-modal Fusion in Large-scale Video Classification

Sep 28, 2018
Jinlai Liu, Zehuan Yuan, Changhu Wang

Figure 1 for Towards Good Practices for Multi-modal Fusion in Large-scale Video Classification

Figure 2 for Towards Good Practices for Multi-modal Fusion in Large-scale Video Classification

Figure 3 for Towards Good Practices for Multi-modal Fusion in Large-scale Video Classification

Figure 4 for Towards Good Practices for Multi-modal Fusion in Large-scale Video Classification

Leveraging both visual frames and audio has been experimentally proven effective to improve large-scale video classification. Previous research on video classification mainly focuses on the analysis of visual content among extracted video frames and their temporal feature aggregation. In contrast, multimodal data fusion is achieved by simple operators like average and concatenation. Inspired by the success of bilinear pooling in the visual and language fusion, we introduce multi-modal factorized bilinear pooling (MFB) to fuse visual and audio representations. We combine MFB with different video-level features and explore its effectiveness in video classification. Experimental results on the challenging Youtube-8M v2 dataset demonstrate that MFB significantly outperforms simple fusion methods in large-scale video classification.

* ECCV YouTube-8M workshop general paper

Via

Access Paper or Ask Questions

An Introduction to Image Synthesis with Generative Adversarial Nets

Mar 12, 2018
He Huang, Phillip S. Yu, Changhu Wang

There has been a drastic growth of research in Generative Adversarial Nets (GANs) in the past few years. Proposed in 2014, GAN has been applied to various applications such as computer vision and natural language processing, and achieves impressive performance. Among the many applications of GAN, image synthesis is the most well-studied one, and research in this area has already demonstrated the great potential of using GAN in image synthesis. In this paper, we provide a taxonomy of methods used in image synthesis, review different models for text-to-image synthesis and image-to-image translation, and discuss some evaluation metrics as well as possible future research directions in image synthesis with GAN.

Via

Access Paper or Ask Questions

MAT: A Multimodal Attentive Translator for Image Captioning

Aug 10, 2017
Chang Liu, Fuchun Sun, Changhu Wang, Feng Wang, Alan Yuille

Figure 1 for MAT: A Multimodal Attentive Translator for Image Captioning

Figure 2 for MAT: A Multimodal Attentive Translator for Image Captioning

Figure 3 for MAT: A Multimodal Attentive Translator for Image Captioning

Figure 4 for MAT: A Multimodal Attentive Translator for Image Captioning

In this work we formulate the problem of image captioning as a multimodal translation task. Analogous to machine translation, we present a sequence-to-sequence recurrent neural networks (RNN) model for image caption generation. Different from most existing work where the whole image is represented by convolutional neural network (CNN) feature, we propose to represent the input image as a sequence of detected objects which feeds as the source sequence of the RNN model. In this way, the sequential representation of an image can be naturally translated to a sequence of words, as the target sequence of the RNN model. To represent the image in a sequential way, we extract the objects features in the image and arrange them in a order using convolutional neural networks. To further leverage the visual information from the encoded objects, a sequential attention layer is introduced to selectively attend to the objects that are related to generate corresponding words in the sentences. Extensive experiments are conducted to validate the proposed approach on popular benchmark dataset, i.e., MS COCO, and the proposed model surpasses the state-of-the-art methods in all metrics following the dataset splits of previous work. The proposed approach is also evaluated by the evaluation server of MS COCO captioning challenge, and achieves very competitive results, e.g., a CIDEr of 1.029 (c5) and 1.064 (c40).

Via

Access Paper or Ask Questions

Modularized Morphing of Neural Networks

Jan 12, 2017
Tao Wei, Changhu Wang, Chang Wen Chen

Figure 1 for Modularized Morphing of Neural Networks

Figure 2 for Modularized Morphing of Neural Networks

Figure 3 for Modularized Morphing of Neural Networks

Figure 4 for Modularized Morphing of Neural Networks

In this work we study the problem of network morphism, an effective learning scheme to morph a well-trained neural network to a new one with the network function completely preserved. Different from existing work where basic morphing types on the layer level were addressed, we target at the central problem of network morphism at a higher level, i.e., how a convolutional layer can be morphed into an arbitrary module of a neural network. To simplify the representation of a network, we abstract a module as a graph with blobs as vertices and convolutional layers as edges, based on which the morphing process is able to be formulated as a graph transformation problem. Two atomic morphing operations are introduced to compose the graphs, based on which modules are classified into two families, i.e., simple morphable modules and complex modules. We present practical morphing solutions for both of these two families, and prove that any reasonable module can be morphed from a single convolutional layer. Extensive experiments have been conducted based on the state-of-the-art ResNet on benchmark datasets, and the effectiveness of the proposed solution has been verified.

* 12 pages, 6 figures, Under review as a conference paper at ICLR 2017

Via

Access Paper or Ask Questions

Surveillance Video Parsing with Single Frame Supervision

Nov 29, 2016
Si Liu, Changhu Wang, Ruihe Qian, Han Yu, Renda Bao

Figure 1 for Surveillance Video Parsing with Single Frame Supervision

Figure 2 for Surveillance Video Parsing with Single Frame Supervision

Figure 3 for Surveillance Video Parsing with Single Frame Supervision

Figure 4 for Surveillance Video Parsing with Single Frame Supervision

Surveillance video parsing, which segments the video frames into several labels, e.g., face, pants, left-leg, has wide applications. However,pixel-wisely annotating all frames is tedious and inefficient. In this paper, we develop a Single frame Video Parsing (SVP) method which requires only one labeled frame per video in training stage. To parse one particular frame, the video segment preceding the frame is jointly considered. SVP (1) roughly parses the frames within the video segment, (2) estimates the optical flow between frames and (3) fuses the rough parsing results warped by optical flow to produce the refined parsing result. The three components of SVP, namely frame parsing, optical flow estimation and temporal fusion are integrated in an end-to-end manner. Experimental results on two surveillance video datasets show the superiority of SVP over state-of-the-arts.

Via

Access Paper or Ask Questions

Network Morphism

Mar 08, 2016
Tao Wei, Changhu Wang, Yong Rui, Chang Wen Chen

We present in this paper a systematic study on how to morph a well-trained neural network to a new one so that its network function can be completely preserved. We define this as \emph{network morphism} in this research. After morphing a parent network, the child network is expected to inherit the knowledge from its parent network and also has the potential to continue growing into a more powerful one with much shortened training time. The first requirement for this network morphism is its ability to handle diverse morphing types of networks, including changes of depth, width, kernel size, and even subnet. To meet this requirement, we first introduce the network morphism equations, and then develop novel morphing algorithms for all these morphing types for both classic and convolutional neural networks. The second requirement for this network morphism is its ability to deal with non-linearity in a network. We propose a family of parametric-activation functions to facilitate the morphing of any continuous non-linear activation neurons. Experimental results on benchmark datasets and typical neural networks demonstrate the effectiveness of the proposed network morphism scheme.

* Under review for ICML 2016

Via

Access Paper or Ask Questions