Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bowen Cheng

Tony

TS2C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection

Jul 13, 2018

Yunchao Wei, Zhiqiang Shen, Bowen Cheng, Honghui Shi, Jinjun Xiong, Jiashi Feng, Thomas Huang

Figure 1 for TS2C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection

Figure 2 for TS2C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection

Figure 3 for TS2C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection

Figure 4 for TS2C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection

Abstract:This work provides a simple approach to discover tight object bounding boxes with only image-level supervision, called Tight box mining with Surrounding Segmentation Context (TS2C). We observe that object candidates mined through current multiple instance learning methods are usually trapped to discriminative object parts, rather than the entire object. TS2C leverages surrounding segmentation context derived from weakly-supervised segmentation to suppress such low-quality distracting candidates and boost the high-quality ones. Specifically, TS2C is developed based on two key properties of desirable bounding boxes: 1) high purity, meaning most pixels in the box are with high object response, and 2) high completeness, meaning the box covers high object response pixels comprehensively. With such novel and computable criteria, more tight candidates can be discovered for learning a better object detector. With TS2C, we obtain 48.0% and 44.4% mAP scores on VOC 2007 and 2012 benchmarks, which are the new state-of-the-arts.

* ECCV2018

Via

Access Paper or Ask Questions

Enhance Visual Recognition under Adverse Conditions via Deep Networks

Dec 20, 2017

Ding Liu, Bowen Cheng, Zhangyang Wang, Haichao Zhang, Thomas S. Huang

Figure 1 for Enhance Visual Recognition under Adverse Conditions via Deep Networks

Figure 2 for Enhance Visual Recognition under Adverse Conditions via Deep Networks

Figure 3 for Enhance Visual Recognition under Adverse Conditions via Deep Networks

Figure 4 for Enhance Visual Recognition under Adverse Conditions via Deep Networks

Abstract:Visual recognition under adverse conditions is a very important and challenging problem of high practical value, due to the ubiquitous existence of quality distortions during image acquisition, transmission, or storage. While deep neural networks have been extensively exploited in the techniques of low-quality image restoration and high-quality image recognition tasks respectively, few studies have been done on the important problem of recognition from very low-quality images. This paper proposes a deep learning based framework for improving the performance of image and video recognition models under adverse conditions, using robust adverse pre-training or its aggressive variant. The robust adverse pre-training algorithms leverage the power of pre-training and generalizes conventional unsupervised pre-training and data augmentation methods. We further develop a transfer learning approach to cope with real-world datasets of unknown adverse conditions. The proposed framework is comprehensively evaluated on a number of image and video recognition benchmarks, and obtains significant performance improvements under various single or mixed adverse conditions. Our visualization and analysis further add to the explainability of results.

Via

Access Paper or Ask Questions

Robust Emotion Recognition from Low Quality and Low Bit Rate Video: A Deep Learning Approach

Sep 10, 2017

Bowen Cheng, Zhangyang Wang, Zhaobin Zhang, Zhu Li, Ding Liu, Jianchao Yang, Shuai Huang, Thomas S. Huang

Figure 1 for Robust Emotion Recognition from Low Quality and Low Bit Rate Video: A Deep Learning Approach

Figure 2 for Robust Emotion Recognition from Low Quality and Low Bit Rate Video: A Deep Learning Approach

Figure 3 for Robust Emotion Recognition from Low Quality and Low Bit Rate Video: A Deep Learning Approach

Figure 4 for Robust Emotion Recognition from Low Quality and Low Bit Rate Video: A Deep Learning Approach

Abstract:Emotion recognition from facial expressions is tremendously useful, especially when coupled with smart devices and wireless multimedia applications. However, the inadequate network bandwidth often limits the spatial resolution of the transmitted video, which will heavily degrade the recognition reliability. We develop a novel framework to achieve robust emotion recognition from low bit rate video. While video frames are downsampled at the encoder side, the decoder is embedded with a deep network model for joint super-resolution (SR) and recognition. Notably, we propose a novel max-mix training strategy, leading to a single "One-for-All" model that is remarkably robust to a vast range of downsampling factors. That makes our framework well adapted for the varied bandwidths in real transmission scenarios, without hampering scalability or efficiency. The proposed framework is evaluated on the AVEC 2016 benchmark, and demonstrates significantly improved stand-alone recognition performance, as well as rate-distortion (R-D) performance, than either directly recognizing from LR frames, or separating SR and recognition.

* Accepted by the Seventh International Conference on Affective Computing and Intelligent Interaction (ACII2017)

Via

Access Paper or Ask Questions