Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David R. Westhead

Graphical Modelling without Independence Assumptions for Uncentered Data

Aug 05, 2024

Bailey Andrew, David R. Westhead, Luisa Cutillo

Abstract:The independence assumption is a useful tool to increase the tractability of one's modelling framework. However, this assumption does not match reality; failing to take dependencies into account can cause models to fail dramatically. The field of multi-axis graphical modelling (also called multi-way modelling, Kronecker-separable modelling) has seen growth over the past decade, but these models require that the data have zero mean. In the multi-axis case, inference is typically done in the single sample scenario, making mean inference impossible. In this paper, we demonstrate how the zero-mean assumption can cause egregious modelling errors, as well as propose a relaxation to the zero-mean assumption that allows the avoidance of such errors. Specifically, we propose the "Kronecker-sum-structured mean" assumption, which leads to models with nonconvex-but-unimodal log-likelihoods that can be solved efficiently with coordinate descent.

* 7 pages (13 counting refs & appendix), 7 figures, 1 table

Via

Access Paper or Ask Questions

Making Multi-Axis Gaussian Graphical Models Scalable to Millions of Samples and Features

Jul 29, 2024

Bailey Andrew, David R. Westhead, Luisa Cutillo

Figure 1 for Making Multi-Axis Gaussian Graphical Models Scalable to Millions of Samples and Features

Figure 2 for Making Multi-Axis Gaussian Graphical Models Scalable to Millions of Samples and Features

Figure 3 for Making Multi-Axis Gaussian Graphical Models Scalable to Millions of Samples and Features

Figure 4 for Making Multi-Axis Gaussian Graphical Models Scalable to Millions of Samples and Features

Abstract:Gaussian graphical models can be used to extract conditional dependencies between the features of the dataset. This is often done by making an independence assumption about the samples, but this assumption is rarely satisfied in reality. However, state-of-the-art approaches that avoid this assumption are not scalable, with $O(n^3)$ runtime and $O(n^2)$ space complexity. In this paper, we introduce a method that has $O(n^2)$ runtime and $O(n)$ space complexity, without assuming independence. We validate our model on both synthetic and real-world datasets, showing that our method's accuracy is comparable to that of prior work We demonstrate that our approach can be used on unprecedentedly large datasets, such as a real-world 1,000,000-cell scRNA-seq dataset; this was impossible with previous approaches. Our method maintains the flexibility of prior work, such as the ability to handle multi-modal tensor-variate datasets and the ability to work with data of arbitrary marginal distributions. An additional advantage of our method is that, unlike prior work, our hyperparameters are easily interpretable.

* 39 pages (48 with appendix+references), 8 figures, 7 tables

Via

Access Paper or Ask Questions