Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hao Shen

CenterMask: single shot instance segmentation with point representation

Apr 11, 2020
Yuqing Wang, Zhaoliang Xu, Hao Shen, Baoshan Cheng, Lirong Yang

Figure 1 for CenterMask: single shot instance segmentation with point representation

Figure 2 for CenterMask: single shot instance segmentation with point representation

Figure 3 for CenterMask: single shot instance segmentation with point representation

Figure 4 for CenterMask: single shot instance segmentation with point representation

In this paper, we propose a single-shot instance segmentation method, which is simple, fast and accurate. There are two main challenges for one-stage instance segmentation: object instances differentiation and pixel-wise feature alignment. Accordingly, we decompose the instance segmentation into two parallel subtasks: Local Shape prediction that separates instances even in overlapping conditions, and Global Saliency generation that segments the whole image in a pixel-to-pixel manner. The outputs of the two branches are assembled to form the final instance masks. To realize that, the local shape information is adopted from the representation of object center points. Totally trained from scratch and without any bells and whistles, the proposed CenterMask achieves 34.5 mask AP with a speed of 12.3 fps, using a single-model with single-scale training/testing on the challenging COCO dataset. The accuracy is higher than all other one-stage instance segmentation methods except the 5 times slower TensorMask, which shows the effectiveness of CenterMask. Besides, our method can be easily embedded to other one-stage object detectors such as FCOS and performs well, showing the generalization of CenterMask.

* To appear at CVPR 2020

Via

Access Paper or Ask Questions

A High-Efficiency Framework for Constructing Large-Scale Face Parsing Benchmark

May 13, 2019
Yinglu Liu, Hailin Shi, Yue Si, Hao Shen, Xiaobo Wang, Tao Mei

Figure 1 for A High-Efficiency Framework for Constructing Large-Scale Face Parsing Benchmark

Figure 2 for A High-Efficiency Framework for Constructing Large-Scale Face Parsing Benchmark

Figure 3 for A High-Efficiency Framework for Constructing Large-Scale Face Parsing Benchmark

Figure 4 for A High-Efficiency Framework for Constructing Large-Scale Face Parsing Benchmark

Face parsing, which is to assign a semantic label to each pixel in face images, has recently attracted increasing interest due to its huge application potentials. Although many face related fields (e.g., face recognition and face detection) have been well studied for many years, the existing datasets for face parsing are still severely limited in terms of the scale and quality, e.g., the widely used Helen dataset only contains 2,330 images. This is mainly because pixel-level annotation is a high cost and time-consuming work, especially for the facial parts without clear boundaries. The lack of accurate annotated datasets becomes a major obstacle in the progress of face parsing task. It is a feasible way to utilize dense facial landmarks to guide the parsing annotation. However, annotating dense landmarks on human face encounters the same issues as the parsing annotation. To overcome the above problems, in this paper, we develop a high-efficiency framework for face parsing annotation, which considerably simplifies and speeds up the parsing annotation by two consecutive modules. Benefit from the proposed framework, we construct a new Dense Landmark Guided Face Parsing (LaPa) benchmark. It consists of 22,000 face images with large variations in expression, pose, occlusion, etc. Each image is provided with accurate annotation of a 11-category pixel-level label map along with coordinates of 106-point landmarks. To the best of our knowledge, it is currently the largest public dataset for face parsing. To make full use of our LaPa dataset with abundant face shape and boundary priors, we propose a simple yet effective Boundary-Sensitive Parsing Network (BSPNet). Our network is taken as a baseline model on the proposed LaPa dataset, and meanwhile, it achieves the state-of-the-art performance on the Helen dataset without resorting to extra face alignment.

Via

Access Paper or Ask Questions

Grand Challenge of 106-Point Facial Landmark Localization

May 09, 2019
Yinglu Liu, Hao Shen, Yue Si, Xiaobo Wang, Xiangyu Zhu, Hailin Shi, Zhibin Hong, Hanqi Guo, Ziyuan Guo, Yanqin Chen, Bi Li, Teng Xi, Jun Yu, Haonian Xie, Guochen Xie, Mengyan Li, Qing Lu, Zengfu Wang, Shenqi Lai, Zhenhua Chai, Xiaoming Wei

Figure 1 for Grand Challenge of 106-Point Facial Landmark Localization

Figure 2 for Grand Challenge of 106-Point Facial Landmark Localization

Figure 3 for Grand Challenge of 106-Point Facial Landmark Localization

Figure 4 for Grand Challenge of 106-Point Facial Landmark Localization

Facial landmark localization is a very crucial step in numerous face related applications, such as face recognition, facial pose estimation, face image synthesis, etc. However, previous competitions on facial landmark localization (i.e., the 300-W, 300-VW and Menpo challenges) aim to predict 68-point landmarks, which are incompetent to depict the structure of facial components. In order to overcome this problem, we construct a challenging dataset, named JD-landmark. Each image is manually annotated with 106-point landmarks. This dataset covers large variations on pose and expression, which brings a lot of difficulties to predict accurate landmarks. We hold a 106-point facial landmark localization competition1 on this dataset in conjunction with IEEE International Conference on Multimedia and Expo (ICME) 2019. The purpose of this competition is to discover effective and robust facial landmark localization approaches.

* Accepted at ICME2019 Grand Challenge

Via

Access Paper or Ask Questions

A Generative Map for Image-based Camera Localization

Apr 16, 2019
Mingpan Guo, Stefan Matthes, Jiaojiao Ye, Hao Shen

Figure 1 for A Generative Map for Image-based Camera Localization

Figure 2 for A Generative Map for Image-based Camera Localization

Figure 3 for A Generative Map for Image-based Camera Localization

Figure 4 for A Generative Map for Image-based Camera Localization

In image-based camera localization systems, information about the environment is usually stored in some representation, which can be referred to as a map. Conventionally, most maps are built upon hand-crafted features. Recently, neural networks have attracted attention as a data-driven map representation, and have shown promising results in visual localization. However, these neural network maps are generally hard to interpret by human. A readable map is not only accessible to humans, but also provides a way to be verified when the ground truth pose is unavailable. To tackle this problem, we propose Generative Map, a new framework for learning human-readable neural network maps, by combining a generative model with the Kalman filter, which also allows it to incorporate additional sensor information such as stereo visual odometry. For evaluation, we use real world images from the 7-Scenes and Oxford RobotCar datasets. We demonstrate that our Generative Map can be queried with a pose of interest from the test sequence to predict an image, which closely resembles the true scene. For localization, we show that Generative Map achieves comparable performance with current regression models. Moreover, our framework is trained completely from scratch, unlike regression models which rely on large ImageNet pretrained networks.

* typo fixes

Via

Access Paper or Ask Questions

Joint Learning of Discriminative Low-dimensional Image Representations Based on Dictionary Learning and Two-layer Orthogonal Projections

Mar 27, 2019
Xian Wei, Hao Shen, Yuanxiang Li, Xuan Tang, Bo Jin, Lijun Zhao, Yi Lu Murphey

Figure 1 for Joint Learning of Discriminative Low-dimensional Image Representations Based on Dictionary Learning and Two-layer Orthogonal Projections

Figure 2 for Joint Learning of Discriminative Low-dimensional Image Representations Based on Dictionary Learning and Two-layer Orthogonal Projections

Figure 3 for Joint Learning of Discriminative Low-dimensional Image Representations Based on Dictionary Learning and Two-layer Orthogonal Projections

Figure 4 for Joint Learning of Discriminative Low-dimensional Image Representations Based on Dictionary Learning and Two-layer Orthogonal Projections

There are some inadequacies in the language description of this paper that require further improvement. This paper is based on a revision of a conference paper. It is now necessary to further explain the difference between the contributions of the two papers.

* Some inappropriate descriptions have been found in this paper

Via

Access Paper or Ask Questions

A Differential Topological View of Challenges in Learning with Feedforward Neural Networks

Nov 26, 2018
Hao Shen

Figure 1 for A Differential Topological View of Challenges in Learning with Feedforward Neural Networks

Figure 2 for A Differential Topological View of Challenges in Learning with Feedforward Neural Networks

Figure 3 for A Differential Topological View of Challenges in Learning with Feedforward Neural Networks

Among many unsolved puzzles in theories of Deep Neural Networks (DNNs), there are three most fundamental challenges that highly demand solutions, namely, expressibility, optimisability, and generalisability. Although there have been significant progresses in seeking answers using various theories, e.g. information bottleneck theory, sparse representation, statistical inference, Riemannian geometry, etc., so far there is no single theory that is able to provide solutions to all these challenges. In this work, we propose to engage the theory of differential topology to address the three problems. By modelling the dataset of interest as a smooth manifold, DNNs can be considered as compositions of smooth maps between smooth manifolds. Specifically, our work offers a differential topological view of loss landscape of DNNs, interplay between width and depth in expressibility, and regularisations for generalisability. Finally, in the setting of deep representation learning, we further apply the quotient topology to investigate the architecture of DNNs, which enables to capture nuisance factors in data with respect to a specific learning task.

* 17 pages, 3 figures

Via

Access Paper or Ask Questions

Trace Quotient with Sparsity Priors for Learning Low Dimensional Image Representations

Oct 08, 2018
Xian Wei, Hao Shen, Martin Kleinsteuber

Figure 1 for Trace Quotient with Sparsity Priors for Learning Low Dimensional Image Representations

Figure 2 for Trace Quotient with Sparsity Priors for Learning Low Dimensional Image Representations

Figure 3 for Trace Quotient with Sparsity Priors for Learning Low Dimensional Image Representations

Figure 4 for Trace Quotient with Sparsity Priors for Learning Low Dimensional Image Representations

This work studies the problem of learning appropriate low dimensional image representations. We propose a generic algorithmic framework, which leverages two classic representation learning paradigms, i.e., sparse representation and the trace quotient criterion. The former is a well-known powerful tool to identify underlying self-explanatory factors of data, while the latter is known for disentangling underlying low dimensional discriminative factors in data. Our developed solutions disentangle sparse representations of images by employing the trace quotient criterion. We construct a unified cost function, coined as the SPARse LOW dimensional representation (SparLow) function, for jointly learning both a sparsifying dictionary and a dimensionality reduction transformation. The SparLow function is widely applicable for developing various algorithms in three classic machine learning scenarios, namely, unsupervised, supervised, and semi-supervised learning. In order to develop efficient joint learning algorithms for maximizing the SparLow function, we deploy a framework of sparse coding with appropriate convex priors to ensure the sparse representations to be locally differentiable. Moreover, we develop an efficient geometric conjugate gradient algorithm to maximize the SparLow function on its underlying Riemannian manifold. Performance of the proposed SparLow algorithmic framework is investigated on several image processing tasks, such as 3D data visualization, face/digit recognition, and object/scene categorization.

* 17 pages

Via

Access Paper or Ask Questions

Linearizing Visual Processes with Convolutional Variational Autoencoders

Mar 20, 2018
Alexander Sagel, Hao Shen

Figure 1 for Linearizing Visual Processes with Convolutional Variational Autoencoders

Figure 2 for Linearizing Visual Processes with Convolutional Variational Autoencoders

Figure 3 for Linearizing Visual Processes with Convolutional Variational Autoencoders

Figure 4 for Linearizing Visual Processes with Convolutional Variational Autoencoders

This work studies the problem of modeling non-linear visual processes by learning linear generative models from observed sequences. We propose a joint learning framework, combining a Linear Dynamic System and a Variational Autoencoder with convolutional layers. After discussing several conditions for linearizing neural networks, we propose an architecture that allows Variational Autoencoders to simultaneously learn the non-linear observation as well as the linear state-transition from a sequence of observed frames. The proposed framework is demonstrated experimentally in three series of synthesis experiments.

Via

Access Paper or Ask Questions

Towards a Mathematical Understanding of the Difficulty in Learning with Feedforward Neural Networks

Nov 21, 2017
Hao Shen

Figure 1 for Towards a Mathematical Understanding of the Difficulty in Learning with Feedforward Neural Networks

Figure 2 for Towards a Mathematical Understanding of the Difficulty in Learning with Feedforward Neural Networks

Training deep neural networks for solving machine learning problems is one great challenge in the field, mainly due to its associated optimisation problem being highly non-convex. Recent developments have suggested that many training algorithms do not suffer from undesired local minima under certain scenario, and consequently led to great efforts in pursuing mathematical explanations for such observations. This work provides an alternative mathematical understanding of the challenge from a smooth optimisation perspective. By assuming exact learning of finite samples, sufficient conditions are identified via a critical point analysis to ensure any local minimum to be globally minimal as well. Furthermore, a state of the art algorithm, known as the Generalised Gauss-Newton (GGN) algorithm, is rigorously revisited as an approximate Newton's algorithm, which shares the property of being locally quadratically convergent to a global minimum under the condition of exact learning.

* 22 pages, 1 figure, submitted for publication

Via

Access Paper or Ask Questions

Reinforcement Learning in Conflicting Environments for Autonomous Vehicles

Oct 22, 2016
Dominik Meyer, Johannes Feldmaier, Hao Shen

Figure 1 for Reinforcement Learning in Conflicting Environments for Autonomous Vehicles

Figure 2 for Reinforcement Learning in Conflicting Environments for Autonomous Vehicles

Figure 3 for Reinforcement Learning in Conflicting Environments for Autonomous Vehicles

Figure 4 for Reinforcement Learning in Conflicting Environments for Autonomous Vehicles

In this work, we investigate the application of Reinforcement Learning to two well known decision dilemmas, namely Newcomb's Problem and Prisoner's Dilemma. These problems are exemplary for dilemmas that autonomous agents are faced with when interacting with humans. Furthermore, we argue that a Newcomb-like formulation is more adequate in the human-machine interaction case and demonstrate empirically that the unmodified Reinforcement Learning algorithms end up with the well known maximum expected utility solution.

Via

Access Paper or Ask Questions