Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Willem Diepeveen

Riemannian Archetypal Analysis: Interpretable non-linear data analysis on deformed star distributions

May 22, 2026

Willem Diepeveen, Deanna Needell

Abstract:Classical archetypal analysis is appealing for its interpretability, but its linear geometry can limit performance on data with strongly non-linear structure; at the same time, existing neural extensions improve flexibility while often weakening the geometric meaning of archetypes and interpolations. In this work, we develop a Riemannian version of archetypal analysis based on data-driven pullback geometry for real-valued data, with the goal of combining the interpretability of classical archetypal analysis with the expressive power of modern non-linear models. We introduce a class of deformed star distributions together with associated pullback Riemannian geometry to provide a statistical interpretation of the resulting manifold mappings, define the Riemannian archetypal mapping (RAM) as a projection onto the manifold of geodesically convex combinations of archetypes, and propose a practical optimization scheme based on convex relaxation followed by non-convex refinement. We further propose a learning scheme that yields reasonable, albeit generally suboptimal, deformed star distributions from data. Experiments on synthetic examples and MNIST show that the resulting framework produces meaningful geodesics, useful denoising projections, and geometry-aware classifications, while also clarifying where current optimization limitations remain.

Via

Access Paper or Ask Questions

Riemannian AmbientFlow: Towards Simultaneous Manifold Learning and Generative Modeling from Corrupted Data

Jan 26, 2026

Willem Diepeveen, Oscar Leong

Abstract:Modern generative modeling methods have demonstrated strong performance in learning complex data distributions from clean samples. In many scientific and imaging applications, however, clean samples are unavailable, and only noisy or linearly corrupted measurements can be observed. Moreover, latent structures, such as manifold geometries, present in the data are important to extract for further downstream scientific analysis. In this work, we introduce Riemannian AmbientFlow, a framework for simultaneously learning a probabilistic generative model and the underlying, nonlinear data manifold directly from corrupted observations. Building on the variational inference framework of AmbientFlow, our approach incorporates data-driven Riemannian geometry induced by normalizing flows, enabling the extraction of manifold structure through pullback metrics and Riemannian Autoencoders. We establish theoretical guarantees showing that, under appropriate geometric regularization and measurement conditions, the learned model recovers the underlying data distribution up to a controllable error and yields a smooth, bi-Lipschitz manifold parametrization. We further show that the resulting smooth decoder can serve as a principled generative prior for inverse problems with recovery guarantees. We empirically validate our approach on low-dimensional synthetic manifolds and on MNIST.

Via

Access Paper or Ask Questions

Manifold Learning with Normalizing Flows: Towards Regularity, Expressivity and Iso-Riemannian Geometry

May 12, 2025

Willem Diepeveen, Deanna Needell

Abstract:Modern machine learning increasingly leverages the insight that high-dimensional data often lie near low-dimensional, non-linear manifolds, an idea known as the manifold hypothesis. By explicitly modeling the geometric structure of data through learning Riemannian geometry algorithms can achieve improved performance and interpretability in tasks like clustering, dimensionality reduction, and interpolation. In particular, learned pullback geometry has recently undergone transformative developments that now make it scalable to learn and scalable to evaluate, which further opens the door for principled non-linear data analysis and interpretable machine learning. However, there are still steps to be taken when considering real-world multi-modal data. This work focuses on addressing distortions and modeling errors that can arise in the multi-modal setting and proposes to alleviate both challenges through isometrizing the learned Riemannian structure and balancing regularity and expressivity of the diffeomorphism parametrization. We showcase the effectiveness of the synergy of the proposed approaches in several numerical experiments with both synthetic and real data.

Via

Access Paper or Ask Questions

Latent Diffeomorphic Dynamic Mode Decomposition

May 09, 2025

Willem Diepeveen, Jon Schwenk, Andrea Bertozzi

Abstract:We present Latent Diffeomorphic Dynamic Mode Decomposition (LDDMD), a new data reduction approach for the analysis of non-linear systems that combines the interpretability of Dynamic Mode Decomposition (DMD) with the predictive power of Recurrent Neural Networks (RNNs). Notably, LDDMD maintains simplicity, which enhances interpretability, while effectively modeling and learning complex non-linear systems with memory, enabling accurate predictions. This is exemplified by its successful application in streamflow prediction.

Via

Access Paper or Ask Questions

Curvature Corrected Nonnegative Manifold Data Factorization

Feb 21, 2025

Joyce Chew, Willem Diepeveen, Deanna Needell

Abstract:Data with underlying nonlinear structure are collected across numerous application domains, necessitating new data processing and analysis methods adapted to nonlinear domain structure. Riemannanian manifolds present a rich environment in which to develop such tools, as manifold-valued data arise in a variety of scientific settings, and Riemannian geometry provides a solid theoretical grounding for geometric data analysis. Low-rank approximations, such as nonnegative matrix factorization (NMF), are the foundation of many Euclidean data analysis methods, so adaptations of these factorizations for manifold-valued data are important building blocks for further development of manifold data analysis. In this work, we propose curvature corrected nonnegative manifold data factorization (CC-NMDF) as a geometry-aware method for extracting interpretable factors from manifold-valued data, analogous to nonnegative matrix factorization. We develop an efficient iterative algorithm for computing CC-NMDF and demonstrate our method on real-world diffusion tensor magnetic resonance imaging data.

Via

Access Paper or Ask Questions

Pullback Flow Matching on Data Manifolds

Oct 06, 2024

Friso de Kruiff, Erik Bekkers, Ozan Öktem, Carola-Bibiane Schönlieb, Willem Diepeveen

Abstract:We propose Pullback Flow Matching (PFM), a novel framework for generative modeling on data manifolds. Unlike existing methods that assume or learn restrictive closed-form manifold mappings for training Riemannian Flow Matching (RFM) models, PFM leverages pullback geometry and isometric learning to preserve the underlying manifold's geometry while enabling efficient generation and precise interpolation in latent space. This approach not only facilitates closed-form mappings on the data manifold but also allows for designable latent spaces, using assumed metrics on both data and latent manifolds. By enhancing isometric learning through Neural ODEs and proposing a scalable training objective, we achieve a latent space more suitable for interpolation, leading to improved manifold learning and generative performance. We demonstrate PFM's effectiveness through applications in synthetic data, protein dynamics and protein sequence data, generating novel proteins with specific properties. This method shows strong potential for drug discovery and materials science, where generating novel samples with specific properties is of great interest.

Via

Access Paper or Ask Questions

Score-based pullback Riemannian geometry

Oct 02, 2024

Willem Diepeveen, Georgios Batzolis, Zakhar Shumaylov, Carola-Bibiane Schönlieb

Figure 1 for Score-based pullback Riemannian geometry

Figure 2 for Score-based pullback Riemannian geometry

Figure 3 for Score-based pullback Riemannian geometry

Figure 4 for Score-based pullback Riemannian geometry

Abstract:Data-driven Riemannian geometry has emerged as a powerful tool for interpretable representation learning, offering improved efficiency in downstream tasks. Moving forward, it is crucial to balance cheap manifold mappings with efficient training algorithms. In this work, we integrate concepts from pullback Riemannian geometry and generative models to propose a framework for data-driven Riemannian geometry that is scalable in both geometry and learning: score-based pullback Riemannian geometry. Focusing on unimodal distributions as a first step, we propose a score-based Riemannian structure with closed-form geodesics that pass through the data probability density. With this structure, we construct a Riemannian autoencoder (RAE) with error bounds for discovering the correct data manifold dimension. This framework can naturally be used with anisotropic normalizing flows by adopting isometry regularization during training. Through numerical experiments on various datasets, we demonstrate that our framework not only produces high-quality geodesics through the data support, but also reliably estimates the intrinsic dimension of the data manifold and provides a global chart of the manifold, even in high-dimensional ambient spaces.

Via

Access Paper or Ask Questions

Pulling back symmetric Riemannian geometry for data analysis

Mar 11, 2024

Willem Diepeveen

Figure 1 for Pulling back symmetric Riemannian geometry for data analysis

Figure 2 for Pulling back symmetric Riemannian geometry for data analysis

Figure 3 for Pulling back symmetric Riemannian geometry for data analysis

Figure 4 for Pulling back symmetric Riemannian geometry for data analysis

Abstract:Data sets tend to live in low-dimensional non-linear subspaces. Ideal data analysis tools for such data sets should therefore account for such non-linear geometry. The symmetric Riemannian geometry setting can be suitable for a variety of reasons. First, it comes with a rich mathematical structure to account for a wide range of non-linear geometries that has been shown to be able to capture the data geometry through empirical evidence from classical non-linear embedding. Second, many standard data analysis tools initially developed for data in Euclidean space can also be generalised efficiently to data on a symmetric Riemannian manifold. A conceptual challenge comes from the lack of guidelines for constructing a symmetric Riemannian structure on the data space itself and the lack of guidelines for modifying successful algorithms on symmetric Riemannian manifolds for data analysis to this setting. This work considers these challenges in the setting of pullback Riemannian geometry through a diffeomorphism. The first part of the paper characterises diffeomorphisms that result in proper, stable and efficient data analysis. The second part then uses these best practices to guide construction of such diffeomorphisms through deep learning. As a proof of concept, different types of pullback geometries -- among which the proposed construction -- are tested on several data analysis tasks and on several toy data sets. The numerical experiments confirm the predictions from theory, i.e., that the diffeomorphisms generating the pullback geometry need to map the data manifold into a geodesic subspace of the pulled back Riemannian manifold while preserving local isometry around the data manifold for proper, stable and efficient data analysis, and that pulling back positive curvature can be problematic in terms of stability.

Via

Access Paper or Ask Questions

Spectral decomposition of atomic structures in heterogeneous cryo-EM

Sep 12, 2022

Carlos Esteve-Yagüe, Willem Diepeveen, Ozan Öktem, Carola-Bibiane Schönlieb

Figure 1 for Spectral decomposition of atomic structures in heterogeneous cryo-EM

Figure 2 for Spectral decomposition of atomic structures in heterogeneous cryo-EM

Figure 3 for Spectral decomposition of atomic structures in heterogeneous cryo-EM

Figure 4 for Spectral decomposition of atomic structures in heterogeneous cryo-EM

Abstract:We consider the problem of recovering the three-dimensional atomic structure of a flexible macromolecule from a heterogeneous cryo-EM dataset. The dataset contains noisy tomographic projections of the electrostatic potential of the macromolecule, taken from different viewing directions, and in the heterogeneous case, each image corresponds to a different conformation of the macromolecule. Under the assumption that the macromolecule can be modelled as a chain, or discrete curve (as it is for instance the case for a protein backbone with a single chain of amino-acids), we introduce a method to estimate the deformation of the atomic model with respect to a given conformation, which is assumed to be known a priori. Our method consists on estimating the torsion and bond angles of the atomic model in each conformation as a linear combination of the eigenfunctions of the Laplace operator in the manifold of conformations. These eigenfunctions can be approximated by means of a well-known technique in manifold learning, based on the construction of a graph Laplacian using the cryo-EM dataset. Finally, we test our approach with synthetic datasets, for which we recover the atomic model of two-dimensional and three-dimensional flexible structures from noisy tomographic projections.

* 26 pages,14 figures

Via

Access Paper or Ask Questions