Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kun Zhang

Max Planck Institute for Intelligent Systems

Deep State-Space Generative Model For Correlated Time-to-Event Predictions

Jul 28, 2024

Yuan Xue, Denny Zhou, Nan Du, Andrew M. Dai, Zhen Xu, Kun Zhang, Claire Cui

Figure 1 for Deep State-Space Generative Model For Correlated Time-to-Event Predictions

Figure 2 for Deep State-Space Generative Model For Correlated Time-to-Event Predictions

Figure 3 for Deep State-Space Generative Model For Correlated Time-to-Event Predictions

Figure 4 for Deep State-Space Generative Model For Correlated Time-to-Event Predictions

Abstract:Capturing the inter-dependencies among multiple types of clinically-critical events is critical not only to accurate future event prediction, but also to better treatment planning. In this work, we propose a deep latent state-space generative model to capture the interactions among different types of correlated clinical events (e.g., kidney failure, mortality) by explicitly modeling the temporal dynamics of patients' latent states. Based on these learned patient states, we further develop a new general discrete-time formulation of the hazard rate function to estimate the survival distribution of patients with significantly improved accuracy. Extensive evaluations over real EMR data show that our proposed model compares favorably to various state-of-the-art baselines. Furthermore, our method also uncovers meaningful insights about the latent correlations among mortality and different types of organ failures.

* Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Via

Access Paper or Ask Questions

Causal Discovery in Linear Models with Unobserved Variables and Measurement Error

Jul 28, 2024

Yuqin Yang, Mohamed Nafea, Negar Kiyavash, Kun Zhang, AmirEmad Ghassami

Abstract:The presence of unobserved common causes and the presence of measurement error are two of the most limiting challenges in the task of causal structure learning. Ignoring either of the two challenges can lead to detecting spurious causal links among variables of interest. In this paper, we study the problem of causal discovery in systems where these two challenges can be present simultaneously. We consider linear models which include four types of variables: variables that are directly observed, variables that are not directly observed but are measured with error, the corresponding measurements, and variables that are neither observed nor measured. We characterize the extent of identifiability of such model under separability condition (i.e., the matrix indicating the independent exogenous noise terms pertaining to the observed variables is identifiable) together with two versions of faithfulness assumptions and propose a notion of observational equivalence. We provide graphical characterization of the models that are equivalent and present a recovery algorithm that could return models equivalent to the ground truth.

Via

Access Paper or Ask Questions

On the Parameter Identifiability of Partially Observed Linear Causal Models

Jul 24, 2024

Xinshuai Dong, Ignavier Ng, Biwei Huang, Yuewen Sun, Songyao Jin, Roberto Legaspi, Peter Spirtes, Kun Zhang

Figure 1 for On the Parameter Identifiability of Partially Observed Linear Causal Models

Figure 2 for On the Parameter Identifiability of Partially Observed Linear Causal Models

Figure 3 for On the Parameter Identifiability of Partially Observed Linear Causal Models

Figure 4 for On the Parameter Identifiability of Partially Observed Linear Causal Models

Abstract:Linear causal models are important tools for modeling causal dependencies and yet in practice, only a subset of the variables can be observed. In this paper, we examine the parameter identifiability of these models by investigating whether the edge coefficients can be recovered given the causal structure and partially observed data. Our setting is more general than that of prior research - we allow all variables, including both observed and latent ones, to be flexibly related, and we consider the coefficients of all edges, whereas most existing works focus only on the edges between observed variables. Theoretically, we identify three types of indeterminacy for the parameters in partially observed linear causal models. We then provide graphical conditions that are sufficient for all parameters to be identifiable and show that some of them are provably necessary. Methodologically, we propose a novel likelihood-based parameter estimation method that addresses the variance indeterminacy of latent variables in a specific way and can asymptotically recover the underlying parameters up to trivial indeterminacy. Empirical studies on both synthetic and real-world datasets validate our identifiability theory and the effectiveness of the proposed method in the finite-sample regime.

Via

Access Paper or Ask Questions

Optimal Kernel Choice for Score Function-based Causal Discovery

Jul 14, 2024

Wenjie Wang, Biwei Huang, Feng Liu, Xinge You, Tongliang Liu, Kun Zhang, Mingming Gong

Figure 1 for Optimal Kernel Choice for Score Function-based Causal Discovery

Figure 2 for Optimal Kernel Choice for Score Function-based Causal Discovery

Figure 3 for Optimal Kernel Choice for Score Function-based Causal Discovery

Figure 4 for Optimal Kernel Choice for Score Function-based Causal Discovery

Abstract:Score-based methods have demonstrated their effectiveness in discovering causal relationships by scoring different causal structures based on their goodness of fit to the data. Recently, Huang et al. proposed a generalized score function that can handle general data distributions and causal relationships by modeling the relations in reproducing kernel Hilbert space (RKHS). The selection of an appropriate kernel within this score function is crucial for accurately characterizing causal relationships and ensuring precise causal discovery. However, the current method involves manual heuristic selection of kernel parameters, making the process tedious and less likely to ensure optimality. In this paper, we propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data. Specifically, we model the generative process of the variables involved in each step of the causal graph search procedure as a mixture of independent noise variables. Based on this model, we derive an automatic kernel selection method by maximizing the marginal likelihood of the variables involved in each search step. We conduct experiments on both synthetic data and real-world benchmarks, and the results demonstrate that our proposed method outperforms heuristic kernel selection methods.

* Accepted by ICML2024

Via

Access Paper or Ask Questions

Empowering Graph Invariance Learning with Deep Spurious Infomax

Jul 13, 2024

Tianjun Yao, Yongqiang Chen, Zhenhao Chen, Kai Hu, Zhiqiang Shen, Kun Zhang

Figure 1 for Empowering Graph Invariance Learning with Deep Spurious Infomax

Figure 2 for Empowering Graph Invariance Learning with Deep Spurious Infomax

Figure 3 for Empowering Graph Invariance Learning with Deep Spurious Infomax

Figure 4 for Empowering Graph Invariance Learning with Deep Spurious Infomax

Abstract:Recently, there has been a surge of interest in developing graph neural networks that utilize the invariance principle on graphs to generalize the out-of-distribution (OOD) data. Due to the limited knowledge about OOD data, existing approaches often pose assumptions about the correlation strengths of the underlying spurious features and the target labels. However, this prior is often unavailable and will change arbitrarily in the real-world scenarios, which may lead to severe failures of the existing graph invariance learning methods. To bridge this gap, we introduce a novel graph invariance learning paradigm, which induces a robust and general inductive bias. The paradigm is built upon the observation that the infomax principle encourages learning spurious features regardless of spurious correlation strengths. We further propose the EQuAD framework that realizes this learning paradigm and employs tailored learning objectives that provably elicit invariant features by disentangling them from the spurious features learned through infomax. Notably, EQuAD shows stable and enhanced performance across different degrees of bias in synthetic datasets and challenging real-world datasets up to $31.76\%$. Our code is available at \url{https://github.com/tianyao-aka/EQuAD}.

* ICML2024 camera-ready version

Via

Access Paper or Ask Questions

Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight

Jul 11, 2024

Zhiqiang Xie, Yujia Zheng, Lizi Ottens, Kun Zhang, Christos Kozyrakis, Jonathan Mace

Abstract:Runtime failure and performance degradation is commonplace in modern cloud systems. For cloud providers, automatically determining the root cause of incidents is paramount to ensuring high reliability and availability as prompt fault localization can enable faster diagnosis and triage for timely resolution. A compelling solution explored in recent work is causal reasoning using causal graphs to capture relationships between varied cloud system performance metrics. To be effective, however, systems developers must correctly define the causal graph of their system, which is a time-consuming, brittle, and challenging task that increases in difficulty for large and dynamic systems and requires domain expertise. Alternatively, automated data-driven approaches have limited efficacy for cloud systems due to the inherent rarity of incidents. In this work, we present Atlas, a novel approach to automatically synthesizing causal graphs for cloud systems. Atlas leverages large language models (LLMs) to generate causal graphs using system documentation, telemetry, and deployment feedback. Atlas is complementary to data-driven causal discovery techniques, and we further enhance Atlas with a data-driven validation step. We evaluate Atlas across a range of fault localization scenarios and demonstrate that Atlas is capable of generating causal graphs in a scalable and generalizable manner, with performance that far surpasses that of data-driven algorithms and is commensurate to the ground-truth baseline.

Via

Access Paper or Ask Questions

Detecting and Identifying Selection Structure in Sequential Data

Jun 29, 2024

Yujia Zheng, Zeyu Tang, Yiwen Qiu, Bernhard Schölkopf, Kun Zhang

Figure 1 for Detecting and Identifying Selection Structure in Sequential Data

Figure 2 for Detecting and Identifying Selection Structure in Sequential Data

Figure 3 for Detecting and Identifying Selection Structure in Sequential Data

Figure 4 for Detecting and Identifying Selection Structure in Sequential Data

Abstract:We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences. Since this selection process often distorts statistical analysis, previous work primarily views it as a bias to be corrected and proposes various methods to mitigate its effect. However, while controlling this bias is crucial, selection also offers an opportunity to provide a deeper insight into the hidden generation process, as it is a fundamental mechanism underlying what we observe. In particular, overlooking selection in sequential data can lead to an incomplete or overcomplicated inductive bias in modeling, such as assuming a universal autoregressive structure for all dependencies. Therefore, rather than merely viewing it as a bias, we explore the causal structure of selection in sequential data to delve deeper into the complete causal process. Specifically, we show that selection structure is identifiable without any parametric assumptions or interventional experiments. Moreover, even in cases where selection variables coexist with latent confounders, we still establish the nonparametric identifiability under appropriate structural conditions. Meanwhile, we also propose a provably correct algorithm to detect and identify selection structures as well as other types of dependencies. The framework has been validated empirically on both synthetic data and real-world music.

* ICML 2024

Via

Access Paper or Ask Questions

MuGSI: Distilling GNNs with Multi-Granularity Structural Information for Graph Classification

Jun 28, 2024

Tianjun Yao, Jiaqi Sun, Defu Cao, Kun Zhang, Guangyi Chen

Figure 1 for MuGSI: Distilling GNNs with Multi-Granularity Structural Information for Graph Classification

Figure 2 for MuGSI: Distilling GNNs with Multi-Granularity Structural Information for Graph Classification

Figure 3 for MuGSI: Distilling GNNs with Multi-Granularity Structural Information for Graph Classification

Figure 4 for MuGSI: Distilling GNNs with Multi-Granularity Structural Information for Graph Classification

Abstract:Recent works have introduced GNN-to-MLP knowledge distillation (KD) frameworks to combine both GNN's superior performance and MLP's fast inference speed. However, existing KD frameworks are primarily designed for node classification within single graphs, leaving their applicability to graph classification largely unexplored. Two main challenges arise when extending KD for node classification to graph classification: (1) The inherent sparsity of learning signals due to soft labels being generated at the graph level; (2) The limited expressiveness of student MLPs, especially in datasets with limited input feature spaces. To overcome these challenges, we introduce MuGSI, a novel KD framework that employs Multi-granularity Structural Information for graph classification. Specifically, we propose multi-granularity distillation loss in MuGSI to tackle the first challenge. This loss function is composed of three distinct components: graph-level distillation, subgraph-level distillation, and node-level distillation. Each component targets a specific granularity of the graph structure, ensuring a comprehensive transfer of structural knowledge from the teacher model to the student model. To tackle the second challenge, MuGSI proposes to incorporate a node feature augmentation component, thereby enhancing the expressiveness of the student MLPs and making them more capable learners. We perform extensive experiments across a variety of datasets and different teacher/student model architectures. The experiment results demonstrate the effectiveness, efficiency, and robustness of MuGSI. Codes are publicly available at: \textbf{\url{https://github.com/tianyao-aka/MuGSI}.}

* 12 pages, 4 figures. Accepted by TheWebConf2024

Via

Access Paper or Ask Questions

Improving the Expressiveness of $K$-hop Message-Passing GNNs by Injecting Contextualized Substructure Information

Jun 27, 2024

Tianjun Yao, Yiongxu Wang, Kun Zhang, Shangsong Liang

Abstract:Graph neural networks (GNNs) have become the \textit{de facto} standard for representational learning in graphs, and have achieved state-of-the-art performance in many graph-related tasks; however, it has been shown that the expressive power of standard GNNs are equivalent maximally to 1-dimensional Weisfeiler-Lehman (1-WL) Test. Recently, there is a line of works aiming to enhance the expressive power of graph neural networks. One line of such works aim at developing $K$-hop message-passing GNNs where node representation is updated by aggregating information from not only direct neighbors but all neighbors within $K$-hop of the node. Another line of works leverages subgraph information to enhance the expressive power which is proven to be strictly more powerful than 1-WL test. In this work, we discuss the limitation of $K$-hop message-passing GNNs and propose \textit{substructure encoding function} to uplift the expressive power of any $K$-hop message-passing GNN. We further inject contextualized substructure information to enhance the expressiveness of $K$-hop message-passing GNNs. Our method is provably more powerful than previous works on $K$-hop graph neural networks and 1-WL subgraph GNNs, which is a specific type of subgraph based GNN models, and not less powerful than 3-WL. Empirically, our proposed method set new state-of-the-art performance or achieves comparable performance for a variety of datasets. Our code is available at \url{https://github.com/tianyao-aka/Expresive_K_hop_GNNs}.

* 13 pages, published in Research track of KDD2023

Via

Access Paper or Ask Questions

Learning Discrete Latent Variable Structures with Tensor Rank Conditions

Jun 11, 2024

Zhengming Chen, Ruichu Cai, Feng Xie, Jie Qiao, Anpeng Wu, Zijian Li, Zhifeng Hao, Kun Zhang

Figure 1 for Learning Discrete Latent Variable Structures with Tensor Rank Conditions

Figure 2 for Learning Discrete Latent Variable Structures with Tensor Rank Conditions

Figure 3 for Learning Discrete Latent Variable Structures with Tensor Rank Conditions

Figure 4 for Learning Discrete Latent Variable Structures with Tensor Rank Conditions

Abstract:Unobserved discrete data are ubiquitous in many scientific disciplines, and how to learn the causal structure of these latent variables is crucial for uncovering data patterns. Most studies focus on the linear latent variable model or impose strict constraints on latent structures, which fail to address cases in discrete data involving non-linear relationships or complex latent structures. To achieve this, we explore a tensor rank condition on contingency tables for an observed variable set $\mathbf{X}_p$, showing that the rank is determined by the minimum support of a specific conditional set (not necessary in $\mathbf{X}_p$) that d-separates all variables in $\mathbf{X}_p$. By this, one can locate the latent variable through probing the rank on different observed variables set, and further identify the latent causal structure under some structure assumptions. We present the corresponding identification algorithm and conduct simulated experiments to verify the effectiveness of our method. In general, our results elegantly extend the identification boundary for causal discovery with discrete latent variables and expand the application scope of causal discovery with latent variables.

Via

Access Paper or Ask Questions