Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chunhua Shen

The University of Adelaide

Efficient Semidefinite Branch-and-Cut for MAP-MRF Inference

Sep 09, 2015

Peng Wang, Chunhua Shen, Anton van den Hengel, Philip Torr

Figure 1 for Efficient Semidefinite Branch-and-Cut for MAP-MRF Inference

Figure 2 for Efficient Semidefinite Branch-and-Cut for MAP-MRF Inference

Figure 3 for Efficient Semidefinite Branch-and-Cut for MAP-MRF Inference

Figure 4 for Efficient Semidefinite Branch-and-Cut for MAP-MRF Inference

Abstract:We propose a Branch-and-Cut (B&C) method for solving general MAP-MRF inference problems. The core of our method is a very efficient bounding procedure, which combines scalable semidefinite programming (SDP) and a cutting-plane method for seeking violated constraints. In order to further speed up the computation, several strategies have been exploited, including model reduction, warm start and removal of inactive constraints. We analyze the performance of the proposed method under different settings, and demonstrate that our method either outperforms or performs on par with state-of-the-art approaches. Especially when the connectivities are dense or when the relative magnitudes of the unary costs are low, we achieve the best reported results. Experiments show that the proposed algorithm achieves better approximation than the state-of-the-art methods within a variety of time budgets on challenging non-submodular MAP-MRF inference problems.

* 21 pages

Via

Access Paper or Ask Questions

Deeply Learning the Messages in Message Passing Inference

Sep 08, 2015

Guosheng Lin, Chunhua Shen, Ian Reid, Anton van den Hengel

Figure 1 for Deeply Learning the Messages in Message Passing Inference

Figure 2 for Deeply Learning the Messages in Message Passing Inference

Figure 3 for Deeply Learning the Messages in Message Passing Inference

Figure 4 for Deeply Learning the Messages in Message Passing Inference

Abstract:Deep structured output learning shows great promise in tasks like semantic image segmentation. We proffer a new, efficient deep structured model learning scheme, in which we show how deep Convolutional Neural Networks (CNNs) can be used to estimate the messages in message passing inference for structured prediction with Conditional Random Fields (CRFs). With such CNN message estimators, we obviate the need to learn or evaluate potential functions for message calculation. This confers significant efficiency for learning, since otherwise when performing structured learning for a CRF with CNN potentials it is necessary to undertake expensive inference for every stochastic gradient iteration. The network output dimension for message estimation is the same as the number of classes, in contrast to the network output for general CNN potential functions in CRFs, which is exponential in the order of the potentials. Hence CNN message learning has fewer network parameters and is more scalable for cases that a large number of classes are involved. We apply our method to semantic image segmentation on the PASCAL VOC 2012 dataset. We achieve an intersection-over-union score of 73.4 on its test set, which is the best reported result for methods using the VOC training images alone. This impressive performance demonstrates the effectiveness and usefulness of our CNN message learning method.

* 11 pages. Appearing in Proc. The Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS), 2015, Montreal, Canada

Via

Access Paper or Ask Questions

Online Metric-Weighted Linear Representations for Robust Visual Tracking

Jul 21, 2015

Xi Li, Chunhua Shen, Anthony Dick, Zhongfei Zhang, Yueting Zhuang

Figure 1 for Online Metric-Weighted Linear Representations for Robust Visual Tracking

Figure 2 for Online Metric-Weighted Linear Representations for Robust Visual Tracking

Figure 3 for Online Metric-Weighted Linear Representations for Robust Visual Tracking

Figure 4 for Online Metric-Weighted Linear Representations for Robust Visual Tracking

Abstract:In this paper, we propose a visual tracker based on a metric-weighted linear representation of appearance. In order to capture the interdependence of different feature dimensions, we develop two online distance metric learning methods using proximity comparison information and structured output learning. The learned metric is then incorporated into a linear representation of appearance. We show that online distance metric learning significantly improves the robustness of the tracker, especially on those sequences exhibiting drastic appearance changes. In order to bound growth in the number of training samples, we design a time-weighted reservoir sampling method. Moreover, we enable our tracker to automatically perform object identification during the process of object tracking, by introducing a collection of static template samples belonging to several object classes of interest. Object identification results for an entire video sequence are achieved by systematically combining the tracking information and visual recognition at each frame. Experimental results on challenging video sequences demonstrate the effectiveness of the method for both inter-frame tracking and object identification.

* 51 pages. Appearing in IEEE Transactions on Pattern Analysis and Machine Intelligence

Via

Access Paper or Ask Questions

Pedestrian Detection with Spatially Pooled Features and Structured Ensemble Learning

Jun 28, 2015

Sakrapee Paisitkriangkrai, Chunhua Shen, Anton van den Hengel

Figure 1 for Pedestrian Detection with Spatially Pooled Features and Structured Ensemble Learning

Figure 2 for Pedestrian Detection with Spatially Pooled Features and Structured Ensemble Learning

Figure 3 for Pedestrian Detection with Spatially Pooled Features and Structured Ensemble Learning

Figure 4 for Pedestrian Detection with Spatially Pooled Features and Structured Ensemble Learning

Abstract:Many typical applications of object detection operate within a prescribed false-positive range. In this situation the performance of a detector should be assessed on the basis of the area under the ROC curve over that range, rather than over the full curve, as the performance outside the range is irrelevant. This measure is labelled as the partial area under the ROC curve (pAUC). We propose a novel ensemble learning method which achieves a maximal detection rate at a user-defined range of false positive rates by directly optimizing the partial AUC using structured learning. In order to achieve a high object detection performance, we propose a new approach to extract low-level visual features based on spatial pooling. Incorporating spatial pooling improves the translational invariance and thus the robustness of the detection process. Experimental results on both synthetic and real-world data sets demonstrate the effectiveness of our approach, and we show that it is possible to train state-of-the-art pedestrian detectors using the proposed structured ensemble learning method with spatially pooled features. The result is the current best reported performance on the Caltech-USA pedestrian detection dataset.

* 19 pages

Via

Access Paper or Ask Questions

Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds

May 20, 2015

Mehrtash Harandi, Richard Hartley, Chunhua Shen, Brian Lovell, Conrad Sanderson

Figure 1 for Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds

Figure 2 for Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds

Figure 3 for Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds

Figure 4 for Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds

Abstract:Sparsity-based representations have recently led to notable results in various visual recognition tasks. In a separate line of research, Riemannian manifolds have been shown useful for dealing with features and models that do not lie in Euclidean spaces. With the aim of building a bridge between the two realms, we address the problem of sparse coding and dictionary learning over the space of linear subspaces, which form Riemannian structures known as Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into the space of symmetric matrices by an isometric mapping. This in turn enables us to extend two sparse coding schemes to Grassmann manifolds. Furthermore, we propose closed-form solutions for learning a Grassmann dictionary, atom by atom. Lastly, to handle non-linearity in data, we extend the proposed Grassmann sparse coding and dictionary learning algorithms through embedding into Hilbert spaces. Experiments on several classification tasks (gender recognition, gesture classification, scene analysis, face recognition, action recognition and dynamic texture classification) show that the proposed approaches achieve considerable improvements in discrimination accuracy, in comparison to state-of-the-art methods such as kernelized Affine Hull Method and graph-embedding Grassmann discriminant analysis.

* Appearing in International Journal of Computer Vision

Via

Access Paper or Ask Questions

Unsupervised Feature Learning for Dense Correspondences across Scenes

Apr 23, 2015

Chao Zhang, Chunhua Shen, Tingzhi Shen

Figure 1 for Unsupervised Feature Learning for Dense Correspondences across Scenes

Figure 2 for Unsupervised Feature Learning for Dense Correspondences across Scenes

Figure 3 for Unsupervised Feature Learning for Dense Correspondences across Scenes

Figure 4 for Unsupervised Feature Learning for Dense Correspondences across Scenes

Abstract:We propose a fast, accurate matching method for estimating dense pixel correspondences across scenes. It is a challenging problem to estimate dense pixel correspondences between images depicting different scenes or instances of the same object category. While most such matching methods rely on hand-crafted features such as SIFT, we learn features from a large amount of unlabeled image patches using unsupervised learning. Pixel-layer features are obtained by encoding over the dictionary, followed by spatial pooling to obtain patch-layer features. The learned features are then seamlessly embedded into a multi-layer match- ing framework. We experimentally demonstrate that the learned features, together with our matching model, outperforms state-of-the-art methods such as the SIFT flow, coherency sensitive hashing and the recent deformable spatial pyramid matching methods both in terms of accuracy and computation efficiency. Furthermore, we evaluate the performance of a few different dictionary learning and feature encoding methods in the proposed pixel correspondences estimation framework, and analyse the impact of dictionary learning and feature encoding with respect to the final matching performance.

* 17 pages

Via

Access Paper or Ask Questions

Learning discriminative trajectorylet detector sets for accurate skeleton-based action recognition

Apr 20, 2015

Ruizhi Qiao, Lingqiao Liu, Chunhua Shen, Anton von den Hengel

Figure 1 for Learning discriminative trajectorylet detector sets for accurate skeleton-based action recognition

Figure 2 for Learning discriminative trajectorylet detector sets for accurate skeleton-based action recognition

Figure 3 for Learning discriminative trajectorylet detector sets for accurate skeleton-based action recognition

Figure 4 for Learning discriminative trajectorylet detector sets for accurate skeleton-based action recognition

Abstract:The introduction of low-cost RGB-D sensors has promoted the research in skeleton-based human action recognition. Devising a representation suitable for characterising actions on the basis of noisy skeleton sequences remains a challenge, however. We here provide two insights into this challenge. First, we show that the discriminative information of a skeleton sequence usually resides in a short temporal interval and we propose a simple-but-effective local descriptor called trajectorylet to capture the static and kinematic information within this interval. Second, we further propose to encode each trajectorylet with a discriminative trajectorylet detector set which is selected from a large number of candidate detectors trained through exemplar-SVMs. The action-level representation is obtained by pooling trajectorylet encodings. Evaluating on standard datasets acquired from the Kinect sensor, it is demonstrated that our method obtains superior results over existing approaches under various experimental setups.

* 10 pages

Via

Access Paper or Ask Questions

Supervised Discrete Hashing

Apr 19, 2015

Fumin Shen, Chunhua Shen, Wei Liu, Heng Tao Shen

Figure 1 for Supervised Discrete Hashing

Figure 2 for Supervised Discrete Hashing

Figure 3 for Supervised Discrete Hashing

Figure 4 for Supervised Discrete Hashing

Abstract:This paper has been withdrawn by the authour.

* This paper has been withdrawn by the authour since the algorithm is being used for patent application

Via

Access Paper or Ask Questions

Temporal Pyramid Pooling Based Convolutional Neural Networks for Action Recognition

Apr 16, 2015

Peng Wang, Yuanzhouhan Cao, Chunhua Shen, Lingqiao Liu, Heng Tao Shen

Figure 1 for Temporal Pyramid Pooling Based Convolutional Neural Networks for Action Recognition

Figure 2 for Temporal Pyramid Pooling Based Convolutional Neural Networks for Action Recognition

Figure 3 for Temporal Pyramid Pooling Based Convolutional Neural Networks for Action Recognition

Figure 4 for Temporal Pyramid Pooling Based Convolutional Neural Networks for Action Recognition

Abstract:Encouraged by the success of Convolutional Neural Networks (CNNs) in image classification, recently much effort is spent on applying CNNs to video based action recognition problems. One challenge is that video contains a varying number of frames which is incompatible to the standard input format of CNNs. Existing methods handle this issue either by directly sampling a fixed number of frames or bypassing this issue by introducing a 3D convolutional layer which conducts convolution in spatial-temporal domain. To solve this issue, here we propose a novel network structure which allows an arbitrary number of frames as the network input. The key of our solution is to introduce a module consisting of an encoding layer and a temporal pyramid pooling layer. The encoding layer maps the activation from previous layers to a feature vector suitable for pooling while the temporal pyramid pooling layer converts multiple frame-level activations into a fixed-length video-level representation. In addition, we adopt a feature concatenation layer which combines appearance information and motion information. Compared with the frame sampling strategy, our method avoids the risk of missing any important frames. Compared with the 3D convolutional method which requires a huge video dataset for network training, our model can be learned on a small target dataset because we can leverage the off-the-shelf image-level CNN for model parameter initialization. Experiments on two challenging datasets, Hollywood2 and HMDB51, demonstrate that our method achieves superior performance over state-of-the-art methods while requiring much fewer training data.

Via

Access Paper or Ask Questions

Mid-level Deep Pattern Mining

Apr 09, 2015

Yao Li, Lingqiao Liu, Chunhua Shen, Anton van den Hengel

Figure 1 for Mid-level Deep Pattern Mining

Figure 2 for Mid-level Deep Pattern Mining

Figure 3 for Mid-level Deep Pattern Mining

Figure 4 for Mid-level Deep Pattern Mining

Abstract:Mid-level visual element discovery aims to find clusters of image patches that are both representative and discriminative. In this work, we study this problem from the prospective of pattern mining while relying on the recently popularized Convolutional Neural Networks (CNNs). Specifically, we find that for an image patch, activations extracted from the first fully-connected layer of CNNs have two appealing properties which enable its seamless integration with pattern mining. Patterns are then discovered from a large number of CNN activations of image patches through the well-known association rule mining. When we retrieve and visualize image patches with the same pattern, surprisingly, they are not only visually similar but also semantically consistent. We apply our approach to scene and object classification tasks, and demonstrate that our approach outperforms all previous works on mid-level visual element discovery by a sizeable margin with far fewer elements being used. Our approach also outperforms or matches recent works using CNN for these tasks. Source code of the complete system is available online.

* Published in Proc. IEEE Conf. Computer Vision and Pattern Recognition 2015

Via

Access Paper or Ask Questions