Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yichen Zhang

Central South University

Pathological Visual Question Answering

Oct 06, 2020

Xuehai He, Zhuo Cai, Wenlan Wei, Yichen Zhang, Luntian Mou, Eric Xing, Pengtao Xie

Figure 1 for Pathological Visual Question Answering

Figure 2 for Pathological Visual Question Answering

Figure 3 for Pathological Visual Question Answering

Figure 4 for Pathological Visual Question Answering

Abstract:Is it possible to develop an "AI Pathologist" to pass the board-certified examination of the American Board of Pathology (ABP)? To build such a system, three challenges need to be addressed. First, we need to create a visual question answering (VQA) dataset where the AI agent is presented with a pathology image together with a question and is asked to give the correct answer. Due to privacy concerns, pathology images are usually not publicly available. Besides, only well-trained pathologists can understand pathology images, but they barely have time to help create datasets for AI research. The second challenge is: since it is difficult to hire highly experienced pathologists to create pathology visual questions and answers, the resulting pathology VQA dataset may contain errors. Training pathology VQA models using these noisy or even erroneous data will lead to problematic models that cannot generalize well on unseen images. The third challenge is: the medical concepts and knowledge covered in pathology question-answer (QA) pairs are very diverse while the number of QA pairs available for modeling training is limited. How to learn effective representations of diverse medical concepts based on limited data is technically demanding. In this paper, we aim to address these three challenges. To our best knowledge, our work represents the first one addressing the pathology VQA problem. To deal with the issue that a publicly available pathology VQA dataset is lacking, we create PathVQA dataset. To address the second challenge, we propose a learning-by-ignoring approach. To address the third challenge, we propose to use cross-modal self-supervised learning. We perform experiments on our created PathVQA dataset and the results demonstrate the effectiveness of our proposed learning-by-ignoring method and cross-modal self-supervised learning methods.

* arXiv admin note: text overlap with arXiv:2003.10286

Via

Access Paper or Ask Questions

Deep Active Learning for Solvability Prediction in Power Systems

Jul 27, 2020

Yichen Zhang, Jianzhe Liu, Feng Qiu, Tianqi Hong, Rui Yao

Figure 1 for Deep Active Learning for Solvability Prediction in Power Systems

Figure 2 for Deep Active Learning for Solvability Prediction in Power Systems

Figure 3 for Deep Active Learning for Solvability Prediction in Power Systems

Abstract:Traditional methods for solvability region analysis can only have inner approximations with inconclusive conservatism. Machine learning methods have been proposed to approach the real region. In this letter, we propose a deep active learning framework for power system solvability prediction. Compared with the passive learning methods where the training is performed after all instances are labeled, the active learning selects most informative instances to be label and therefore significantly reduce the size of labeled dataset for training. In the active learning framework, the acquisition functions, which correspond to different sampling strategies, are defined in terms of the on-the-fly posterior probability from the classifier. The IEEE 39-bus system is employed to validate the proposed framework, where a two-dimensional case is illustrated to visualize the effectiveness of the sampling method followed by the full-dimensional numerical experiments.

Via

Access Paper or Ask Questions

COVID-CT-Dataset: A CT Scan Dataset about COVID-19

Mar 30, 2020

Jinyu Zhao, Yichen Zhang, Xuehai He, Pengtao Xie

Figure 1 for COVID-CT-Dataset: A CT Scan Dataset about COVID-19

Figure 2 for COVID-CT-Dataset: A CT Scan Dataset about COVID-19

Figure 3 for COVID-CT-Dataset: A CT Scan Dataset about COVID-19

Abstract:CT scans are promising in providing accurate, fast, and cheap screening and testing of COVID-19. In this paper, we build a publicly available COVID-CT dataset, containing 275 CT scans that are positive for COVID-19, to foster the research and development of deep learning methods which predict whether a person is affected with COVID-19 by analyzing his/her CTs. We train a deep convolutional neural network on this dataset and achieve an F1 of 0.85 which is a promising performance but yet to be further improved. The data and code are available at https://github.com/UCSD-AI4H/COVID-CT

Via

Access Paper or Ask Questions

HRank: Filter Pruning using High-Rank Feature Map

Mar 16, 2020

Mingbao Lin, Rongrong Ji, Yan Wang, Yichen Zhang, Baochang Zhang, Yonghong Tian, Ling Shao

Figure 1 for HRank: Filter Pruning using High-Rank Feature Map

Figure 2 for HRank: Filter Pruning using High-Rank Feature Map

Figure 3 for HRank: Filter Pruning using High-Rank Feature Map

Figure 4 for HRank: Filter Pruning using High-Rank Feature Map

Abstract:Neural network pruning offers a promising prospect to facilitate deploying deep neural networks on resource-limited devices. However, existing methods are still challenged by the training inefficiency and labor cost in pruning designs, due to missing theoretical guidance of non-salient network components. In this paper, we propose a novel filter pruning method by exploring the High Rank of feature maps (HRank). Our HRank is inspired by the discovery that the average rank of multiple feature maps generated by a single filter is always the same, regardless of the number of image batches CNNs receive. Based on HRank, we develop a method that is mathematically formulated to prune filters with low-rank feature maps. The principle behind our pruning is that low-rank feature maps contain less information, and thus pruned results can be easily reproduced. Besides, we experimentally show that weights with high-rank feature maps contain more important information, such that even when a portion is not updated, very little damage would be done to the model performance. Without introducing any additional constraints, HRank leads to significant improvements over the state-of-the-arts in terms of FLOPs and parameters reduction, with similar accuracies. For example, with ResNet-110, we achieve a 58.2%-FLOPs reduction by removing 59.2% of the parameters, with only a small loss of 0.14% in top-1 accuracy on CIFAR-10. With Res-50, we achieve a 43.8%-FLOPs reduction by removing 36.7% of the parameters, with only a loss of 1.17% in the top-1 accuracy on ImageNet. The codes can be available at https://github.com/lmbxmu/HRank.

Via

Access Paper or Ask Questions

Decentralized Visual-Inertial-UWB Fusion for Relative State Estimation of Aerial Swarm

Mar 11, 2020

Hao Xu, Luqi Wang, Yichen Zhang, Kejie Qiu, Shaojie Shen

Figure 1 for Decentralized Visual-Inertial-UWB Fusion for Relative State Estimation of Aerial Swarm

Figure 2 for Decentralized Visual-Inertial-UWB Fusion for Relative State Estimation of Aerial Swarm

Figure 3 for Decentralized Visual-Inertial-UWB Fusion for Relative State Estimation of Aerial Swarm

Figure 4 for Decentralized Visual-Inertial-UWB Fusion for Relative State Estimation of Aerial Swarm

Abstract:The collaboration of unmanned aerial vehicles (UAVs) has become a popular research topic for its practicability in multiple scenarios. The collaboration of multiple UAVs, which is also known as aerial swarm is a highly complex system, which still lacks a state-of-art decentralized relative state estimation method. In this paper, we present a novel fully decentralized visual-inertial-UWB fusion framework for relative state estimation and demonstrate the practicability by performing extensive aerial swarm flight experiments. The comparison result with ground truth data from the motion capture system shows the centimeter-level precision which outperforms all the Ultra-WideBand (UWB) and even vision based method. The system is not limited by the field of view (FoV) of the camera or Global Positioning System (GPS), meanwhile on account of its estimation consistency, we believe that the proposed relative state estimation framework has the potential to be prevalently adopted by aerial swarm applications in different scenarios in multiple scales.

* Accepted ICRA 2020

Via

Access Paper or Ask Questions

PathVQA: 30000+ Questions for Medical Visual Question Answering

Mar 07, 2020

Xuehai He, Yichen Zhang, Luntian Mou, Eric Xing, Pengtao Xie

Figure 1 for PathVQA: 30000+ Questions for Medical Visual Question Answering

Figure 2 for PathVQA: 30000+ Questions for Medical Visual Question Answering

Figure 3 for PathVQA: 30000+ Questions for Medical Visual Question Answering

Figure 4 for PathVQA: 30000+ Questions for Medical Visual Question Answering

Abstract:Is it possible to develop an "AI Pathologist" to pass the board-certified examination of the American Board of Pathology? To achieve this goal, the first step is to create a visual question answering (VQA) dataset where the AI agent is presented with a pathology image together with a question and is asked to give the correct answer. Our work makes the first attempt to build such a dataset. Different from creating general-domain VQA datasets where the images are widely accessible and there are many crowdsourcing workers available and capable of generating question-answer pairs, developing a medical VQA dataset is much more challenging. First, due to privacy concerns, pathology images are usually not publicly available. Second, only well-trained pathologists can understand pathology images, but they barely have time to help create datasets for AI research. To address these challenges, we resort to pathology textbooks and online digital libraries. We develop a semi-automated pipeline to extract pathology images and captions from textbooks and generate question-answer pairs from captions using natural language processing. We collect 32,799 open-ended questions from 4,998 pathology images where each question is manually checked to ensure correctness. To our best knowledge, this is the first dataset for pathology VQA. Our dataset will be released publicly to promote research in medical VQA.

Via

Access Paper or Ask Questions

Approximating Trajectory Constraints with Machine Learning -- Microgrid Islanding with Frequency Constraints

Feb 21, 2020

Yichen Zhang, Chen Chen, Guodong Liu, Tianqi Hong, Feng Qiu

Figure 1 for Approximating Trajectory Constraints with Machine Learning -- Microgrid Islanding with Frequency Constraints

Figure 2 for Approximating Trajectory Constraints with Machine Learning -- Microgrid Islanding with Frequency Constraints

Figure 3 for Approximating Trajectory Constraints with Machine Learning -- Microgrid Islanding with Frequency Constraints

Figure 4 for Approximating Trajectory Constraints with Machine Learning -- Microgrid Islanding with Frequency Constraints

Abstract:In this paper, we introduce a deep learning aided constraint encoding method to tackle the frequency-constraint microgrid scheduling problem. The nonlinear function between system operating condition and frequency nadir is approximated by using a neural network, which admits an exact mixed-integer formulation (MIP). This formulation is then integrated with the scheduling problem to encode the frequency constraint. With the stronger representation power of the neural network, the resulting commands can ensure adequate frequency response in a realistic setting in addition to islanding success. The proposed method is validated on a modified 33-node system. Successful islanding with a secure response is simulated under the scheduled commands using a detailed three-phase model in Simulink. The advantages of our model are particularly remarkable when the inertia emulation functions from wind turbine generators are considered.

Via

Access Paper or Ask Questions

Training convolutional neural networks with cheap convolutions and online distillation

Oct 10, 2019

Jiao Xie, Shaohui Lin, Yichen Zhang, Linkai Luo

Figure 1 for Training convolutional neural networks with cheap convolutions and online distillation

Figure 2 for Training convolutional neural networks with cheap convolutions and online distillation

Figure 3 for Training convolutional neural networks with cheap convolutions and online distillation

Figure 4 for Training convolutional neural networks with cheap convolutions and online distillation

Abstract:The large memory and computation consumption in convolutional neural networks (CNNs) has been one of the main barriers for deploying them on resource-limited systems. To this end, most cheap convolutions (e.g., group convolution, depth-wise convolution, and shift convolution) have recently been used for memory and computation reduction but with the specific architecture designing. Furthermore, it results in a low discriminability of the compressed networks by directly replacing the standard convolution with these cheap ones. In this paper, we propose to use knowledge distillation to improve the performance of the compact student networks with cheap convolutions. In our case, the teacher is a network with the standard convolution, while the student is a simple transformation of the teacher architecture without complicated redesigning. In particular, we propose a novel online distillation method, which online constructs the teacher network without pre-training and conducts mutual learning between the teacher and student network, to improve the performance of the student model. Extensive experiments demonstrate that the proposed approach achieves superior performance to simultaneously reduce memory and computation overhead of cutting-edge CNNs on different datasets, including CIFAR-10/100 and ImageNet ILSVRC 2012, compared to the state-of-the-art CNN compression and acceleration methods. The codes are publicly available at https://github.com/EthanZhangYC/OD-cheap-convolution.

Via

Access Paper or Ask Questions

Reconstruction of Natural Visual Scenes from Neural Spikes with Deep Neural Networks

Apr 30, 2019

Yichen Zhang, Shanshan Jia, Yajing Zheng, Zhaofei Yu, Yonghong Tian, Tiejun Huang, Jian K. Liu

Figure 1 for Reconstruction of Natural Visual Scenes from Neural Spikes with Deep Neural Networks

Figure 2 for Reconstruction of Natural Visual Scenes from Neural Spikes with Deep Neural Networks

Figure 3 for Reconstruction of Natural Visual Scenes from Neural Spikes with Deep Neural Networks

Figure 4 for Reconstruction of Natural Visual Scenes from Neural Spikes with Deep Neural Networks

Abstract:Neural coding is one of the central questions in systems neuroscience for understanding how the brain processes stimulus from the environment, moreover, it is also a cornerstone for designing algorithms of brain-machine interface, where decoding incoming stimulus is needed for better performance of physical devices. Traditionally, the neural signal of interest for decoding visual scenes has been focused on fMRI data. However, our visual perception operates in a fast time scale of millisecond in terms of an event termed neural spike. So far there are few studies of decoding by using spikes. Here we fulfill this aim by developing a novel decoding framework based on deep neural networks, named spike-image decoder (SID), for reconstructing natural visual scenes, including static images and dynamic videos, from experimentally recorded spikes of a population of retinal ganglion cells. The SID is an end-to-end decoder with one end as neural spikes and the other end as images, which can be trained directly such that visual scenes are reconstructed from spikes in a highly accurate fashion. In addition, we show that SID can be generalized to arbitrary images by using image datasets of MNIST, CIFAR10, and CIFAR100. Furthermore, with a pre-trained SID, one can decode any dynamic videos, with the aid of an encoder, to achieve real-time encoding and decoding visual scenes by spikes. Altogether, our results shed new light on neuromorphic computing for artificial visual systems, such as event-based visual cameras and visual neuroprostheses.

* 19 pages, 7 figures

Via

Access Paper or Ask Questions

First-order Newton-type Estimator for Distributed Estimation and Inference

Nov 28, 2018

Xi Chen, Weidong Liu, Yichen Zhang

Figure 1 for First-order Newton-type Estimator for Distributed Estimation and Inference

Figure 2 for First-order Newton-type Estimator for Distributed Estimation and Inference

Figure 3 for First-order Newton-type Estimator for Distributed Estimation and Inference

Figure 4 for First-order Newton-type Estimator for Distributed Estimation and Inference

Abstract:This paper studies distributed estimation and inference for a general statistical problem with a convex loss that could be non-differentiable. For the purpose of efficient computation, we restrict ourselves to stochastic first-order optimization, which enjoys low per-iteration complexity. To motivate the proposed method, we first investigate the theoretical properties of a straightforward Divide-and-Conquer Stochastic Gradient Descent (DC-SGD) approach. Our theory shows that there is a restriction on the number of machines and this restriction becomes more stringent when the dimension $p$ is large. To overcome this limitation, this paper proposes a new multi-round distributed estimation procedure that approximates the Newton step only using stochastic subgradient. The key component in our method is the proposal of a computationally efficient estimator of $\Sigma^{-1} w$, where $\Sigma$ is the population Hessian matrix and $w$ is any given vector. Instead of estimating $\Sigma$ (or $\Sigma^{-1}$) that usually requires the second-order differentiability of the loss, the proposed First-Order Newton-type Estimator (FONE) directly estimates the vector of interest $\Sigma^{-1} w$ as a whole and is applicable to non-differentiable losses. Our estimator also facilitates the inference for the empirical risk minimizer. It turns out that the key term in the limiting covariance has the form of $\Sigma^{-1} w$, which can be estimated by FONE.

* 56 pages

Via

Access Paper or Ask Questions