Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shixia Liu

Diagnosing Concept Drift with Visual Analytics

Jul 29, 2020
Weikai Yang, Zhen Li, Mengchen Liu, Yafeng Lu, Kelei Cao, Ross Maciejewski, Shixia Liu

Figure 1 for Diagnosing Concept Drift with Visual Analytics

Figure 2 for Diagnosing Concept Drift with Visual Analytics

Figure 3 for Diagnosing Concept Drift with Visual Analytics

Figure 4 for Diagnosing Concept Drift with Visual Analytics

Concept drift is a phenomenon in which the distribution of a data stream changes over time in unforeseen ways, causing prediction models built on historical data to become inaccurate. While a variety of automated methods have been developed to identify when concept drift occurs, there is limited support for analysts who need to understand and correct their models when drift is detected. In this paper, we present a visual analytics method, DriftVis, to support model builders and analysts in the identification and correction of concept drift in streaming data. DriftVis combines a distribution-based drift detection method with a streaming scatterplot to support the analysis of drift caused by the distribution changes of data streams and to explore the impact of these changes on the model's accuracy. Two case studies on weather prediction and text classification have been conducted to demonstrate our proposed tool and illustrate how visual analytics can be used to support the detection, examination, and correction of concept drift.

* 10 pages + 2 pages reference, 8 figures, VAST 2020

Via

Access Paper or Ask Questions

Diagnosing Concept Drift in Streaming Data

Jul 28, 2020
Weikai Yang, Zhen Li, Mengchen Liu, Yafeng Lu, Kelei Cao, Ross Maciejewski, Shixia Liu

Figure 1 for Diagnosing Concept Drift in Streaming Data

Figure 2 for Diagnosing Concept Drift in Streaming Data

Figure 3 for Diagnosing Concept Drift in Streaming Data

Figure 4 for Diagnosing Concept Drift in Streaming Data

* 10 pages, 8 figures, VAST 2020

Via

Access Paper or Ask Questions

OoDAnalyzer: Interactive Analysis of Out-of-Distribution Samples

Feb 08, 2020
Changjian Chen, Jun Yuan, Yafeng Lu, Yang Liu, Hang Su, Songtao Yuan, Shixia Liu

Figure 1 for OoDAnalyzer: Interactive Analysis of Out-of-Distribution Samples

Figure 2 for OoDAnalyzer: Interactive Analysis of Out-of-Distribution Samples

Figure 3 for OoDAnalyzer: Interactive Analysis of Out-of-Distribution Samples

Figure 4 for OoDAnalyzer: Interactive Analysis of Out-of-Distribution Samples

One major cause of performance degradation in predictive models is that the test samples are not well covered by the training data. Such not well-represented samples are called OoD samples. In this paper, we propose OoDAnalyzer, a visual analysis approach for interactively identifying OoD samples and explaining them in context. Our approach integrates an ensemble OoD detection method and a grid-based visualization. The detection method is improved from deep ensembles by combining more features with algorithms in the same family. To better analyze and understand the OoD samples in context, we have developed a novel kNN-based grid layout algorithm motivated by Hall's theorem. The algorithm approximates the optimal layout and has $O(kN^2)$ time complexity, faster than the grid layout algorithm with overall best performance but $O(N^3)$ time complexity. Quantitative evaluation and case studies were performed on several datasets to demonstrate the effectiveness and usefulness of OoDAnalyzer.

* 14 pages, 13 figures

Via

Access Paper or Ask Questions

Analyzing the Noise Robustness of Deep Neural Networks

Jan 26, 2020
Kelei Cao, Mengchen Liu, Hang Su, Jing Wu, Jun Zhu, Shixia Liu

Figure 1 for Analyzing the Noise Robustness of Deep Neural Networks

Figure 2 for Analyzing the Noise Robustness of Deep Neural Networks

Figure 3 for Analyzing the Noise Robustness of Deep Neural Networks

Figure 4 for Analyzing the Noise Robustness of Deep Neural Networks

Adversarial examples, generated by adding small but intentionally imperceptible perturbations to normal examples, can mislead deep neural networks (DNNs) to make incorrect predictions. Although much work has been done on both adversarial attack and defense, a fine-grained understanding of adversarial examples is still lacking. To address this issue, we present a visual analysis method to explain why adversarial examples are misclassified. The key is to compare and analyze the datapaths of both the adversarial and normal examples. A datapath is a group of critical neurons along with their connections. We formulate the datapath extraction as a subset selection problem and solve it by constructing and training a neural network. A multi-level visualization consisting of a network-level visualization of data flows, a layer-level visualization of feature maps, and a neuron-level visualization of learned features, has been designed to help investigate how datapaths of adversarial and normal examples diverge and merge in the prediction process. A quantitative evaluation and a case study were conducted to demonstrate the promise of our method to explain the misclassification of adversarial examples.

Via

Access Paper or Ask Questions

Recent Research Advances on Interactive Machine Learning

Nov 12, 2018
Liu Jiang, Shixia Liu, Changjian Chen

Figure 1 for Recent Research Advances on Interactive Machine Learning

Figure 2 for Recent Research Advances on Interactive Machine Learning

Figure 3 for Recent Research Advances on Interactive Machine Learning

Figure 4 for Recent Research Advances on Interactive Machine Learning

Interactive Machine Learning (IML) is an iterative learning process that tightly couples a human with a machine learner, which is widely used by researchers and practitioners to effectively solve a wide variety of real-world application problems. Although recent years have witnessed the proliferation of IML in the field of visual analytics, most recent surveys either focus on a specific area of IML or aim to summarize a visualization field that is too generic for IML. In this paper, we systematically review the recent literature on IML and classify them into a task-oriented taxonomy built by us. We conclude the survey with a discussion of open challenges and research opportunities that we believe are inspiring for future work in IML.

Via

Access Paper or Ask Questions

Visual Analytics for Explainable Deep Learning

Apr 07, 2018
Jaegul Choo, Shixia Liu

Figure 1 for Visual Analytics for Explainable Deep Learning

Figure 2 for Visual Analytics for Explainable Deep Learning

Figure 3 for Visual Analytics for Explainable Deep Learning

Figure 4 for Visual Analytics for Explainable Deep Learning

Recently, deep learning has been advancing the state of the art in artificial intelligence to a new level, and humans rely on artificial intelligence techniques more than ever. However, even with such unprecedented advancements, the lack of explanation regarding the decisions made by deep learning models and absence of control over their internal processes act as major drawbacks in critical decision-making processes, such as precision medicine and law enforcement. In response, efforts are being made to make deep learning interpretable and controllable by humans. In this paper, we review visual analytics, information visualization, and machine learning perspectives relevant to this aim, and discuss potential challenges and future research directions.

* IEEE Computer Graphics and Applications, 2018

Via

Access Paper or Ask Questions

Scalable Inference for Nested Chinese Restaurant Process Topic Models

Feb 23, 2017
Jianfei Chen, Jun Zhu, Jie Lu, Shixia Liu

Figure 1 for Scalable Inference for Nested Chinese Restaurant Process Topic Models

Figure 2 for Scalable Inference for Nested Chinese Restaurant Process Topic Models

Figure 3 for Scalable Inference for Nested Chinese Restaurant Process Topic Models

Figure 4 for Scalable Inference for Nested Chinese Restaurant Process Topic Models

Nested Chinese Restaurant Process (nCRP) topic models are powerful nonparametric Bayesian methods to extract a topic hierarchy from a given text corpus, where the hierarchical structure is automatically determined by the data. Hierarchical Latent Dirichlet Allocation (hLDA) is a popular instance of nCRP topic models. However, hLDA has only been evaluated at small scale, because the existing collapsed Gibbs sampling and instantiated weight variational inference algorithms either are not scalable or sacrifice inference quality with mean-field assumptions. Moreover, an efficient distributed implementation of the data structures, such as dynamically growing count matrices and trees, is challenging. In this paper, we propose a novel partially collapsed Gibbs sampling (PCGS) algorithm, which combines the advantages of collapsed and instantiated weight algorithms to achieve good scalability as well as high model quality. An initialization strategy is presented to further improve the model quality. Finally, we propose an efficient distributed implementation of PCGS through vectorization, pre-processing, and a careful design of the concurrent data structures and communication strategy. Empirical studies show that our algorithm is 111 times more efficient than the previous open-source implementation for hLDA, with comparable or even better model quality. Our distributed implementation can extract 1,722 topics from a 131-million-document corpus with 28 billion tokens, which is 4-5 orders of magnitude larger than the previous largest corpus, with 50 machines in 7 hours.

Via

Access Paper or Ask Questions

Towards Better Analysis of Machine Learning Models: A Visual Analytics Perspective

Feb 04, 2017
Shixia Liu, Xiting Wang, Mengchen Liu, Jun Zhu

Figure 1 for Towards Better Analysis of Machine Learning Models: A Visual Analytics Perspective

Figure 2 for Towards Better Analysis of Machine Learning Models: A Visual Analytics Perspective

Figure 3 for Towards Better Analysis of Machine Learning Models: A Visual Analytics Perspective

Figure 4 for Towards Better Analysis of Machine Learning Models: A Visual Analytics Perspective

Interactive model analysis, the process of understanding, diagnosing, and refining a machine learning model with the help of interactive visualization, is very important for users to efficiently solve real-world artificial intelligence and data mining problems. Dramatic advances in big data analytics has led to a wide variety of interactive model analysis tasks. In this paper, we present a comprehensive analysis and interpretation of this rapidly developing area. Specifically, we classify the relevant work into three categories: understanding, diagnosis, and refinement. Each category is exemplified by recent influential work. Possible future research opportunities are also explored and discussed.

* This article will be published in Visual Infomatics

Via

Access Paper or Ask Questions

Towards Better Analysis of Deep Convolutional Neural Networks

May 04, 2016
Mengchen Liu, Jiaxin Shi, Zhen Li, Chongxuan Li, Jun Zhu, Shixia Liu

Figure 1 for Towards Better Analysis of Deep Convolutional Neural Networks

Figure 2 for Towards Better Analysis of Deep Convolutional Neural Networks

Figure 3 for Towards Better Analysis of Deep Convolutional Neural Networks

Figure 4 for Towards Better Analysis of Deep Convolutional Neural Networks

Deep convolutional neural networks (CNNs) have achieved breakthrough performance in many pattern recognition tasks such as image classification. However, the development of high-quality deep models typically relies on a substantial amount of trial-and-error, as there is still no clear understanding of when and why a deep model works. In this paper, we present a visual analytics approach for better understanding, diagnosing, and refining deep CNNs. We formulate a deep CNN as a directed acyclic graph. Based on this formulation, a hybrid visualization is developed to disclose the multiple facets of each neuron and the interactions between them. In particular, we introduce a hierarchical rectangle packing algorithm and a matrix reordering algorithm to show the derived features of a neuron cluster. We also propose a biclustering-based edge bundling method to reduce visual clutter caused by a large number of connections between neurons. We evaluated our method on a set of CNNs and the results are generally favorable.

* Submitted to VIS 2016

Via

Access Paper or Ask Questions