Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yewei Xia

Score-based Generative Modeling for Conditional Independence Testing

May 29, 2025

Yixin Ren, Chenghou Jin, Yewei Xia, Li Ke, Longtao Huang, Hui Xue, Hao Zhang, Jihong Guan, Shuigeng Zhou

Abstract:Determining conditional independence (CI) relationships between random variables is a fundamental yet challenging task in machine learning and statistics, especially in high-dimensional settings. Existing generative model-based CI testing methods, such as those utilizing generative adversarial networks (GANs), often struggle with undesirable modeling of conditional distributions and training instability, resulting in subpar performance. To address these issues, we propose a novel CI testing method via score-based generative modeling, which achieves precise Type I error control and strong testing power. Concretely, we first employ a sliced conditional score matching scheme to accurately estimate conditional score and use Langevin dynamics conditional sampling to generate null hypothesis samples, ensuring precise Type I error control. Then, we incorporate a goodness-of-fit stage into the method to verify generated samples and enhance interpretability in practice. We theoretically establish the error bound of conditional distributions modeled by score-based generative models and prove the validity of our CI tests. Extensive experiments on both synthetic and real-world datasets show that our method significantly outperforms existing state-of-the-art methods, providing a promising way to revitalize generative model-based CI testing.

* Accepted by KDD2025

Via

Access Paper or Ask Questions

Fast Causal Discovery by Approximate Kernel-based Generalized Score Functions with Linear Computational Complexity

Dec 23, 2024

Yixin Ren, Haocheng Zhang, Yewei Xia, Hao Zhang, Jihong Guan, Shuigeng Zhou

Figure 1 for Fast Causal Discovery by Approximate Kernel-based Generalized Score Functions with Linear Computational Complexity

Figure 2 for Fast Causal Discovery by Approximate Kernel-based Generalized Score Functions with Linear Computational Complexity

Figure 3 for Fast Causal Discovery by Approximate Kernel-based Generalized Score Functions with Linear Computational Complexity

Figure 4 for Fast Causal Discovery by Approximate Kernel-based Generalized Score Functions with Linear Computational Complexity

Abstract:Score-based causal discovery methods can effectively identify causal relationships by evaluating candidate graphs and selecting the one with the highest score. One popular class of scores is kernel-based generalized score functions, which can adapt to a wide range of scenarios and work well in practice because they circumvent assumptions about causal mechanisms and data distributions. Despite these advantages, kernel-based generalized score functions pose serious computational challenges in time and space, with a time complexity of $\mathcal{O}(n^3)$ and a memory complexity of $\mathcal{O}(n^2)$, where $n$ is the sample size. In this paper, we propose an approximate kernel-based generalized score function with $\mathcal{O}(n)$ time and space complexities by using low-rank technique and designing a set of rules to handle the complex composite matrix operations required to calculate the score, as well as developing sampling algorithms for different data types to benefit the handling of diverse data types efficiently. Our extensive causal discovery experiments on both synthetic and real-world data demonstrate that compared to the state-of-the-art method, our method can not only significantly reduce computational costs, but also achieve comparable accuracy, especially for large datasets.

Via

Access Paper or Ask Questions

Flexible Differentially Private Vertical Federated Learning with Adaptive Feature Embeddings

Jul 26, 2023

Yuxi Mi, Hongquan Liu, Yewei Xia, Yiheng Sun, Jihong Guan, Shuigeng Zhou

Figure 1 for Flexible Differentially Private Vertical Federated Learning with Adaptive Feature Embeddings

Figure 2 for Flexible Differentially Private Vertical Federated Learning with Adaptive Feature Embeddings

Figure 3 for Flexible Differentially Private Vertical Federated Learning with Adaptive Feature Embeddings

Figure 4 for Flexible Differentially Private Vertical Federated Learning with Adaptive Feature Embeddings

Abstract:The emergence of vertical federated learning (VFL) has stimulated concerns about the imperfection in privacy protection, as shared feature embeddings may reveal sensitive information under privacy attacks. This paper studies the delicate equilibrium between data privacy and task utility goals of VFL under differential privacy (DP). To address the generality issue of prior arts, this paper advocates a flexible and generic approach that decouples the two goals and addresses them successively. Specifically, we initially derive a rigorous privacy guarantee by applying norm clipping on shared feature embeddings, which is applicable across various datasets and models. Subsequently, we demonstrate that task utility can be optimized via adaptive adjustments on the scale and distribution of feature embeddings in an accuracy-appreciative way, without compromising established DP mechanisms. We concretize our observation into the proposed VFL-AFE framework, which exhibits effectiveness against privacy attacks and the capacity to retain favorable task utility, as substantiated by extensive experiments.

Via

Access Paper or Ask Questions