Alert button
Picture for Quan Li

Quan Li

Alert button

Towards Better Modeling with Missing Data: A Contrastive Learning-based Visual Analytics Perspective

Sep 18, 2023
Laixin Xie, Yang Ouyang, Longfei Chen, Ziming Wu, Quan Li

Figure 1 for Towards Better Modeling with Missing Data: A Contrastive Learning-based Visual Analytics Perspective
Figure 2 for Towards Better Modeling with Missing Data: A Contrastive Learning-based Visual Analytics Perspective
Figure 3 for Towards Better Modeling with Missing Data: A Contrastive Learning-based Visual Analytics Perspective
Figure 4 for Towards Better Modeling with Missing Data: A Contrastive Learning-based Visual Analytics Perspective

Missing data can pose a challenge for machine learning (ML) modeling. To address this, current approaches are categorized into feature imputation and label prediction and are primarily focused on handling missing data to enhance ML performance. These approaches rely on the observed data to estimate the missing values and therefore encounter three main shortcomings in imputation, including the need for different imputation methods for various missing data mechanisms, heavy dependence on the assumption of data distribution, and potential introduction of bias. This study proposes a Contrastive Learning (CL) framework to model observed data with missing values, where the ML model learns the similarity between an incomplete sample and its complete counterpart and the dissimilarity between other samples. Our proposed approach demonstrates the advantages of CL without requiring any imputation. To enhance interpretability, we introduce CIVis, a visual analytics system that incorporates interpretable techniques to visualize the learning process and diagnose the model status. Users can leverage their domain knowledge through interactive sampling to identify negative and positive pairs in CL. The output of CIVis is an optimized model that takes specified features and predicts downstream tasks. We provide two usage scenarios in regression and classification tasks and conduct quantitative experiments, expert interviews, and a qualitative user study to demonstrate the effectiveness of our approach. In short, this study offers a valuable contribution to addressing the challenges associated with ML modeling in the presence of missing data by providing a practical solution that achieves high predictive accuracy and model interpretability.

* 18 pages, 11 figures. This paper is accepted by IEEE Transactions on Visualization and Computer Graphics (TVCG) 
Viaarxiv icon

Inspecting the Process of Bank Credit Rating via Visual Analytics

Aug 06, 2021
Qiangqiang Liu, Quan Li, Zhihua Zhu, Tangzhi Ye, Xiaojuan Ma

Figure 1 for Inspecting the Process of Bank Credit Rating via Visual Analytics
Figure 2 for Inspecting the Process of Bank Credit Rating via Visual Analytics
Figure 3 for Inspecting the Process of Bank Credit Rating via Visual Analytics

Bank credit rating classifies banks into different levels based on publicly disclosed and internal information, serving as an important input in financial risk management. However, domain experts have a vague idea of exploring and comparing different bank credit rating schemes. A loose connection between subjective and quantitative analysis and difficulties in determining appropriate indicator weights obscure understanding of bank credit ratings. Furthermore, existing models fail to consider bank types by just applying a unified indicator weight set to all banks. We propose RatingVis to assist experts in exploring and comparing different bank credit rating schemes. It supports interactively inferring indicator weights for banks by involving domain knowledge and considers bank types in the analysis loop. We conduct a case study with real-world bank data to verify the efficacy of RatingVis. Expert feedback suggests that our approach helps them better understand different rating schemes.

* IEEE VIS 2021 Short Paper 
Viaarxiv icon

EmoCo: Visual Analysis of Emotion Coherence in Presentation Videos

Jul 29, 2019
Haipeng Zeng, Xingbo Wang, Aoyu Wu, Yong Wang, Quan Li, Alex Endert, Huamin Qu

Figure 1 for EmoCo: Visual Analysis of Emotion Coherence in Presentation Videos
Figure 2 for EmoCo: Visual Analysis of Emotion Coherence in Presentation Videos
Figure 3 for EmoCo: Visual Analysis of Emotion Coherence in Presentation Videos
Figure 4 for EmoCo: Visual Analysis of Emotion Coherence in Presentation Videos

Emotions play a key role in human communication and public presentations. Human emotions are usually expressed through multiple modalities. Therefore, exploring multimodal emotions and their coherence is of great value for understanding emotional expressions in presentations and improving presentation skills. However, manually watching and studying presentation videos is often tedious and time-consuming. There is a lack of tool support to help conduct an efficient and in-depth multi-level analysis. Thus, in this paper, we introduce EmoCo, an interactive visual analytics system to facilitate efficient analysis of emotion coherence across facial, text, and audio modalities in presentation videos. Our visualization system features a channel coherence view and a sentence clustering view that together enable users to obtain a quick overview of emotion coherence and its temporal evolution. In addition, a detail view and word view enable detailed exploration and comparison from the sentence level and word level, respectively. We thoroughly evaluate the proposed system and visualization techniques through two usage scenarios based on TED Talk videos and interviews with two domain experts. The results demonstrate the effectiveness of our system in gaining insights into emotion coherence in presentations.

* 11 pages, 8 figures. Accepted by IEEE VAST 2019 
Viaarxiv icon

Privileged Features Distillation for E-Commerce Recommendations

Jul 11, 2019
Chen Xu, Quan Li, Junfeng Ge, Jinyang Gao, Xiaoyong Yang, Changhua Pei, Hanxiao Sun, Wenwu Ou

Figure 1 for Privileged Features Distillation for E-Commerce Recommendations
Figure 2 for Privileged Features Distillation for E-Commerce Recommendations
Figure 3 for Privileged Features Distillation for E-Commerce Recommendations
Figure 4 for Privileged Features Distillation for E-Commerce Recommendations

Features play an important role in most prediction tasks of e-commerce recommendations. To guarantee the consistence of off-line training and on-line serving, we usually utilize the same features that are both available. However, the consistence in turn neglects some discriminative features. For example, when estimating the conversion rate (CVR), i.e., the probability that a user would purchase the item after she has clicked it, features like dwell time on the item detailed page can be very informative. However, CVR prediction should be conducted for on-line ranking before the click happens. Thus we cannot get such post-event features during serving. Here we define the features that are discriminative but only available during training as the privileged features. Inspired by the distillation techniques which bridge the gap between training and inference, in this work, we propose privileged features distillation (PFD). We train two models, i.e., a student model that is the same as the original one and a teacher model that additionally utilizes the privileged features. Knowledge distilled from the more accurate teacher is transferred to the student, which helps to improve its prediction accuracy. During serving, only the student part is extracted. To our knowledge, this is the first work to fully exploit the potential of such features. To validate the effectiveness of PFD, we conduct experiments on two fundamental prediction tasks in Taobao recommendations, i.e., click-through rate (CTR) at coarse-grained ranking and CVR at fine-grained ranking. By distilling the interacted features that are prohibited during serving for CTR and the post-event features for CVR, we achieve significant improvements over both of the strong baselines. Besides, by addressing several issues of training PFD, we obtain comparable training speed as the baselines without any distillation.

* 8 pages 
Viaarxiv icon

Matrix Completion from $O(n)$ Samples in Linear Time

Aug 22, 2017
David Gamarnik, Quan Li, Hongyi Zhang

Figure 1 for Matrix Completion from $O(n)$ Samples in Linear Time

We consider the problem of reconstructing a rank-$k$ $n \times n$ matrix $M$ from a sampling of its entries. Under a certain incoherence assumption on $M$ and for the case when both the rank and the condition number of $M$ are bounded, it was shown in \cite{CandesRecht2009, CandesTao2010, keshavan2010, Recht2011, Jain2012, Hardt2014} that $M$ can be recovered exactly or approximately (depending on some trade-off between accuracy and computational complexity) using $O(n \, \text{poly}(\log n))$ samples in super-linear time $O(n^{a} \, \text{poly}(\log n))$ for some constant $a \geq 1$. In this paper, we propose a new matrix completion algorithm using a novel sampling scheme based on a union of independent sparse random regular bipartite graphs. We show that under the same conditions w.h.p. our algorithm recovers an $\epsilon$-approximation of $M$ in terms of the Frobenius norm using $O(n \log^2(1/\epsilon))$ samples and in linear time $O(n \log^2(1/\epsilon))$. This provides the best known bounds both on the sample complexity and computational complexity for reconstructing (approximately) an unknown low-rank matrix. The novelty of our algorithm is two new steps of thresholding singular values and rescaling singular vectors in the application of the "vanilla" alternating minimization algorithm. The structure of sparse random regular graphs is used heavily for controlling the impact of these regularization steps.

* 45 pages, 1 figure. Short version accepted for presentation at Conference on Learning Theory (COLT) 2017 
Viaarxiv icon