Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yao Wang

Masked-RPCA: Sparse and Low-rank Decomposition Under Overlaying Model and Application to Moving Object Detection

Sep 17, 2019
Amirhossein Khalilian-Gourtani, Shervin Minaee, Yao Wang

Figure 1 for Masked-RPCA: Sparse and Low-rank Decomposition Under Overlaying Model and Application to Moving Object Detection

Figure 2 for Masked-RPCA: Sparse and Low-rank Decomposition Under Overlaying Model and Application to Moving Object Detection

Figure 3 for Masked-RPCA: Sparse and Low-rank Decomposition Under Overlaying Model and Application to Moving Object Detection

Figure 4 for Masked-RPCA: Sparse and Low-rank Decomposition Under Overlaying Model and Application to Moving Object Detection

Foreground detection in a given video sequence is a pivotal step in many computer vision applications such as video surveillance system. Robust Principal Component Analysis (RPCA) performs low-rank and sparse decomposition and accomplishes such a task when the background is stationary and the foreground is dynamic and relatively small. A fundamental issue with RPCA is the assumption that the low-rank and sparse components are added at each element, whereas in reality, the moving foreground is overlaid on the background. We propose the representation via masked decomposition (i.e. an overlaying model) where each element either belongs to the low-rank or the sparse component, decided by a mask. We propose the Masked-RPCA algorithm to recover the mask and the low-rank components simultaneously, utilizing linearizing and alternating direction techniques. We further extend our formulation to be robust to dynamic changes in the background and enforce spatial connectivity in the foreground component. Our study shows significant improvement of the detected mask compared to post-processing on the sparse component obtained by other frameworks.

Via

Access Paper or Ask Questions

PrTransH: Embedding Probabilistic Medical Knowledge from Real World EMR Data

Sep 02, 2019
Linfeng Li, Peng Wang, Yao Wang, Jinpeng Jiang, Buzhou Tang, Jun Yan, Shenghui Wang, Yuting Liu

Figure 1 for PrTransH: Embedding Probabilistic Medical Knowledge from Real World EMR Data

Figure 2 for PrTransH: Embedding Probabilistic Medical Knowledge from Real World EMR Data

Figure 3 for PrTransH: Embedding Probabilistic Medical Knowledge from Real World EMR Data

Figure 4 for PrTransH: Embedding Probabilistic Medical Knowledge from Real World EMR Data

This paper proposes an algorithm named as PrTransH to learn embedding vectors from real world EMR data based medical knowledge. The unique challenge in embedding medical knowledge graph from real world EMR data is that the uncertainty of knowledge triplets blurs the border between "correct triplet" and "wrong triplet", changing the fundamental assumption of many existing algorithms. To address the challenge, some enhancements are made to existing TransH algorithm, including: 1) involve probability of medical knowledge triplet into training objective; 2) replace the margin-based ranking loss with unified loss calculation considering both valid and corrupted triplets; 3) augment training data set with medical background knowledge. Verifications on real world EMR data based medical knowledge graph prove that PrTransH outperforms TransH in link prediction task. To the best of our survey, this paper is the first one to learn and verify knowledge embedding on probabilistic knowledge graphs.

Via

Access Paper or Ask Questions

Identification of relevant diffusion MRI metrics impacting cognitive functions using a novel feature selection method

Aug 10, 2019
Tongda Xu, Xiyan Cai, Yao Wang, Xiuyuan Wang, Sohae Chung, Els Fieremans, Joseph Rath, Steven Flanagan, Yvonne W Lui

Figure 1 for Identification of relevant diffusion MRI metrics impacting cognitive functions using a novel feature selection method

Figure 2 for Identification of relevant diffusion MRI metrics impacting cognitive functions using a novel feature selection method

Figure 3 for Identification of relevant diffusion MRI metrics impacting cognitive functions using a novel feature selection method

Figure 4 for Identification of relevant diffusion MRI metrics impacting cognitive functions using a novel feature selection method

Mild Traumatic Brain Injury (mTBI) is a significant public health problem. The most troubling symptoms after mTBI are cognitive complaints. Studies show measurable differences between patients with mTBI and healthy controls with respect to tissue microstructure using diffusion MRI. However, it remains unclear which diffusion measures are the most informative with regard to cognitive functions in both the healthy state as well as after injury. In this study, we use diffusion MRI to formulate a predictive model for performance on working memory based on the most relevant MRI features. The key challenge is to identify relevant features over a large feature space with high accuracy in an efficient manner. To tackle this challenge, we propose a novel improvement of the best first search approach with crossover operators inspired by genetic algorithm. Compared against other heuristic feature selection algorithms, the proposed method achieves significantly more accurate predictions and yields clinically interpretable selected features.

Via

Access Paper or Ask Questions

Deep Plug-and-play Prior for Low-rank Tensor Completion

May 11, 2019
Wen-Hao Xu, Xi-Le Zhao, Tai-Xiang Jiang, Yao Wang, Michael Ng

Figure 1 for Deep Plug-and-play Prior for Low-rank Tensor Completion

Figure 2 for Deep Plug-and-play Prior for Low-rank Tensor Completion

Figure 3 for Deep Plug-and-play Prior for Low-rank Tensor Completion

Figure 4 for Deep Plug-and-play Prior for Low-rank Tensor Completion

Tensor image data sets such as color images and multispectral images are highly correlated and they contain a lot of image details. The main aim of this paper is to propose and develop a regularized tensor completion model for tensor image data completion. In the objective function, we adopt the newly emerged tensor nuclear norm (TNN) to characterize the global structure of such tensor image data sets. Also, we formulate an implicit regularizer to plug in the convolutional neural network (CNN) denoiser, which is convinced to express the image prior learned from a large amount of natural images. The resulting model can be solved efficiently via an alternating directional method of multipliers algorithm. Experimental results (on color images, videos, and multispectral images) are presented to show that both image global structure and details can be recovered very well, and to illustrate that the performance of the proposed method is better than that of testing methods in terms of PSNR and SSIM.

Via

Access Paper or Ask Questions

Non-local Attention Optimized Deep Image Compression

Apr 22, 2019
Haojie Liu, Tong Chen, Peiyao Guo, Qiu Shen, Xun Cao, Yao Wang, Zhan Ma

Figure 1 for Non-local Attention Optimized Deep Image Compression

Figure 2 for Non-local Attention Optimized Deep Image Compression

Figure 3 for Non-local Attention Optimized Deep Image Compression

Figure 4 for Non-local Attention Optimized Deep Image Compression

This paper proposes a novel Non-Local Attention Optimized Deep Image Compression (NLAIC) framework, which is built on top of the popular variational auto-encoder (VAE) structure. Our NLAIC framework embeds non-local operations in the encoders and decoders for both image and latent feature probability information (known as hyperprior) to capture both local and global correlations, and apply attention mechanism to generate masks that are used to weigh the features for the image and hyperprior, which implicitly adapt bit allocation for different features based on their importance. Furthermore, both hyperpriors and spatial-channel neighbors of the latent features are used to improve entropy coding. The proposed model outperforms the existing methods on Kodak dataset, including learned (e.g., Balle2019, Balle2018) and conventional (e.g., BPG, JPEG2000, JPEG) image compression methods, for both PSNR and MS-SSIM distortion metrics.

Via

Access Paper or Ask Questions

Deep Generative Learning via Variational Gradient Flow

Feb 07, 2019
Yuan Gao, Yuling Jiao, Yang Wang, Yao Wang, Can Yang, Shunkang Zhang

Figure 1 for Deep Generative Learning via Variational Gradient Flow

Figure 2 for Deep Generative Learning via Variational Gradient Flow

Figure 3 for Deep Generative Learning via Variational Gradient Flow

Figure 4 for Deep Generative Learning via Variational Gradient Flow

We propose a general framework to learn deep generative models via \textbf{V}ariational \textbf{Gr}adient Fl\textbf{ow} (VGrow) on probability spaces. The evolving distribution that asymptotically converges to the target distribution is governed by a vector field, which is the negative gradient of the first variation of the $f$-divergence between them. We prove that the evolving distribution coincides with the pushforward distribution through the infinitesimal time composition of residual maps that are perturbations of the identity map along the vector field. The vector field depends on the density ratio of the pushforward distribution and the target distribution, which can be consistently learned from a binary classification problem. Connections of our proposed VGrow method with other popular methods, such as VAE, GAN and flow-based methods, have been established in this framework, gaining new insights of deep generative learning. We also evaluated several commonly used divergences, including Kullback-Leibler, Jensen-Shannon, Jeffrey divergences as well as our newly discovered `logD' divergence which serves as the objective function of the logD-trick GAN. Experimental results on benchmark datasets demonstrate that VGrow can generate high-fidelity images in a stable and efficient manner, achieving competitive performance with state-of-the-art GANs.

Via

Access Paper or Ask Questions

TrackNet: Simultaneous Object Detection and Tracking and Its Application in Traffic Video Analysis

Feb 04, 2019
Chenge Li, Gregory Dobler, Xin Feng, Yao Wang

Figure 1 for TrackNet: Simultaneous Object Detection and Tracking and Its Application in Traffic Video Analysis

Figure 2 for TrackNet: Simultaneous Object Detection and Tracking and Its Application in Traffic Video Analysis

Figure 3 for TrackNet: Simultaneous Object Detection and Tracking and Its Application in Traffic Video Analysis

Figure 4 for TrackNet: Simultaneous Object Detection and Tracking and Its Application in Traffic Video Analysis

Object detection and object tracking are usually treated as two separate processes. Significant progress has been made for object detection in 2D images using deep learning networks. The usual tracking-by-detection pipeline for object tracking requires that the object is successfully detected in the first frame and all subsequent frames, and tracking is done by associating detection results. Performing object detection and object tracking through a single network remains a challenging open question. We propose a novel network structure named trackNet that can directly detect a 3D tube enclosing a moving object in a video segment by extending the faster R-CNN framework. A Tube Proposal Network (TPN) inside the trackNet is proposed to predict the objectness of each candidate tube and location parameters specifying the bounding tube. The proposed framework is applicable for detecting and tracking any object and in this paper, we focus on its application for traffic video analysis. The proposed model is trained and tested on UA-DETRAC, a large traffic video dataset available for multi-vehicle detection and tracking, and obtained very promising results.

Via

Access Paper or Ask Questions

Very Long Term Field of View Prediction for 360-degree Video Streaming

Feb 04, 2019
Chenge Li, Weixi Zhang, Yong Liu, Yao Wang

Figure 1 for Very Long Term Field of View Prediction for 360-degree Video Streaming

Figure 2 for Very Long Term Field of View Prediction for 360-degree Video Streaming

Figure 3 for Very Long Term Field of View Prediction for 360-degree Video Streaming

Figure 4 for Very Long Term Field of View Prediction for 360-degree Video Streaming

360-degree videos have gained increasing popularity in recent years with the developments and advances in Virtual Reality (VR) and Augmented Reality (AR) technologies. In such applications, a user only watches a video scene within a field of view (FoV) centered in a certain direction. Predicting the future FoV in a long time horizon (more than seconds ahead) can help save bandwidth resources in on-demand video streaming while minimizing video freezing in networks with significant bandwidth variations. In this work, we treat the FoV prediction as a sequence learning problem, and propose to predict the target user's future FoV not only based on the user's own past FoV center trajectory but also other users' future FoV locations. We propose multiple prediction models based on two different FoV representations: one using FoV center trajectories and another using equirectangular heatmaps that represent the FoV center distributions. Extensive evaluations with two public datasets demonstrate that the proposed models can significantly outperform benchmark models, and other users' FoVs are very helpful for improving long-term predictions.

Via

Access Paper or Ask Questions

Reconstructing Speech Stimuli From Human Auditory Cortex Activity Using a WaveNet Approach

Nov 08, 2018
Ran Wang, Yao Wang, Adeen Flinker

Figure 1 for Reconstructing Speech Stimuli From Human Auditory Cortex Activity Using a WaveNet Approach

Figure 2 for Reconstructing Speech Stimuli From Human Auditory Cortex Activity Using a WaveNet Approach

Figure 3 for Reconstructing Speech Stimuli From Human Auditory Cortex Activity Using a WaveNet Approach

Figure 4 for Reconstructing Speech Stimuli From Human Auditory Cortex Activity Using a WaveNet Approach

The superior temporal gyrus (STG) region of cortex critically contributes to speech recognition. In this work, we show that a proposed WaveNet, with limited available data, is able to reconstruct speech stimuli from STG intracranial recordings. We further investigate the impulse response of the fitted model for each recording electrode and observe phoneme level temporospectral tuning properties for the recorded area of cortex. This discovery is consistent with previous studies implicating the posterior STG (pSTG) in a phonetic representation of speech and provides detailed acoustic features that certain electrode sites possibly extract during speech recognition.

* 6 pages, 3 figures. Conference of 2018 IEEE Signal Processing in Medicine and Biology Symposium (SPMB 2018)

Via

Access Paper or Ask Questions

Deep BV: A Fully Automated System for Brain Ventricle Localization and Segmentation in 3D Ultrasound Images of Embryonic Mice

Nov 05, 2018
Ziming Qiu, Jack Langerman, Nitin Nair, Orlando Aristizabal, Jonathan Mamou, Daniel H. Turnbull, Jeffrey Ketterling, Yao Wang

Figure 1 for Deep BV: A Fully Automated System for Brain Ventricle Localization and Segmentation in 3D Ultrasound Images of Embryonic Mice

Figure 2 for Deep BV: A Fully Automated System for Brain Ventricle Localization and Segmentation in 3D Ultrasound Images of Embryonic Mice

Figure 3 for Deep BV: A Fully Automated System for Brain Ventricle Localization and Segmentation in 3D Ultrasound Images of Embryonic Mice

Figure 4 for Deep BV: A Fully Automated System for Brain Ventricle Localization and Segmentation in 3D Ultrasound Images of Embryonic Mice

Volumetric analysis of brain ventricle (BV) structure is a key tool in the study of central nervous system development in embryonic mice. High-frequency ultrasound (HFU) is the only non-invasive, real-time modality available for rapid volumetric imaging of embryos in utero. However, manual segmentation of the BV from HFU volumes is tedious, time-consuming, and requires specialized expertise. In this paper, we propose a novel deep learning based BV segmentation system for whole-body HFU images of mouse embryos. Our fully automated system consists of two modules: localization and segmentation. It first applies a volumetric convolutional neural network on a 3D sliding window over the entire volume to identify a 3D bounding box containing the entire BV. It then employs a fully convolutional network to segment the detected bounding box into BV and background. The system achieves a Dice Similarity Coefficient (DSC) of 0.8956 for BV segmentation on an unseen 111 HFU volume test set surpassing the previous state-of-the-art method (DSC of 0.7119) by a margin of 25%.

* IEEE Signal Processing in Medicine and Biology Symposium - 2018, 6 pages, 5 figures

Via

Access Paper or Ask Questions