 # Ronald R. Coifman

## Hyperbolic Diffusion Embedding and Distance for Hierarchical Representation Learning

May 30, 2023    Finding meaningful representations and distances of hierarchical data is important in many fields. This paper presents a new method for hierarchical data embedding and distance. Our method relies on combining diffusion geometry, a central approach to manifold learning, and hyperbolic geometry. Specifically, using diffusion geometry, we build multi-scale densities on the data, aimed to reveal their hierarchical structure, and then embed them into a product of hyperbolic spaces. We show theoretically that our embedding and distance recover the underlying hierarchical structure. In addition, we demonstrate the efficacy of the proposed method and its advantages compared to existing methods on graph embedding benchmarks and hierarchical datasets.

Via ## A common variable minimax theorem for graphs

Jul 30, 2021    Let $\mathcal{G} = \{G_1 = (V, E_1), \dots, G_m = (V, E_m)\}$ be a collection of $m$ graphs defined on a common set of vertices $V$ but with different edge sets $E_1, \dots, E_m$. Informally, a function $f :V \rightarrow \mathbb{R}$ is smooth with respect to $G_k = (V,E_k)$ if $f(u) \sim f(v)$ whenever $(u, v) \in E_k$. We study the problem of understanding whether there exists a nonconstant function that is smooth with respect to all graphs in $\mathcal{G}$, simultaneously, and how to find it if it exists.

* 21 pages, 11 figures
Via ## Doubly-Stochastic Normalization of the Gaussian Kernel is Robust to Heteroskedastic Noise

May 31, 2020    A fundamental step in many data-analysis techniques is the construction of an affinity matrix describing similarities between data points. When the data points reside in Euclidean space, a widespread approach is to from an affinity matrix by the Gaussian kernel with pairwise distances, and to follow with a certain normalization (e.g. the row-stochastic normalization or its symmetric variant). We demonstrate that the doubly-stochastic normalization of the Gaussian kernel with zero main diagonal (i.e. no self loops) is robust to heteroskedastic noise. That is, the doubly-stochastic normalization is advantageous in that it automatically accounts for observations with different noise variances. Specifically, we prove that in a suitable high-dimensional setting where heteroskedastic noise does not concentrate too much in any particular direction in space, the resulting (doubly-stochastic) noisy affinity matrix converges to its clean counterpart with rate $m^{-1/2}$, where $m$ is the ambient dimension. We demonstrate this result numerically, and show that in contrast, the popular row-stochastic and symmetric normalizations behave unfavorably under heteroskedastic noise. Furthermore, we provide a prototypical example of simulated single-cell RNA sequence data with strong intrinsic heteroskedasticity, where the advantage of the doubly-stochastic normalization for exploratory analysis is evident.

Via ## LOCA: LOcal Conformal Autoencoder for standardized data coordinates

Apr 15, 2020    We propose a deep-learning based method for obtaining standardized data coordinates from scientific measurements.Data observations are modeled as samples from an unknown, non-linear deformation of an underlying Riemannian manifold, which is parametrized by a few normalized latent variables. By leveraging a repeated measurement sampling strategy, we present a method for learning an embedding in $\mathbb{R}^d$ that is isometric to the latent variables of the manifold. These data coordinates, being invariant under smooth changes of variables, enable matching between different instrumental observations of the same phenomenon. Our embedding is obtained using a LOcal Conformal Autoencoder (LOCA), an algorithm that constructs an embedding to rectify deformations by using a local z-scoring procedure while preserving relevant geometric information. We demonstrate the isometric embedding properties of LOCA on various model settings and observe that it exhibits promising interpolation and extrapolation capabilities. Finally, we apply LOCA to single-site Wi-Fi localization data, and to $3$-dimensional curved surface estimation based on a $2$-dimensional projection.

Via ## Co-manifold learning with missing data

Oct 16, 2018   Representation learning is typically applied to only one mode of a data matrix, either its rows or columns. Yet in many applications, there is an underlying geometry to both the rows and the columns. We propose utilizing this coupled structure to perform co-manifold learning: uncovering the underlying geometry of both the rows and the columns of a given matrix, where we focus on a missing data setting. Our unsupervised approach consists of three components. We first solve a family of optimization problems to estimate a complete matrix at multiple scales of smoothness. We then use this collection of smooth matrix estimates to compute pairwise distances on the rows and columns based on a new multi-scale metric that implicitly introduces a coupling between the rows and the columns. Finally, we construct row and column representations from these multi-scale metrics. We demonstrate that our approach outperforms competing methods in both data visualization and clustering.

* 16 pages, 9 figures
Via ## Two-sample Statistics Based on Anisotropic Kernels

Aug 30, 2018    The paper introduces a new kernel-based Maximum Mean Discrepancy (MMD) statistic for measuring the distance between two distributions given finitely-many multivariate samples. When the distributions are locally low-dimensional, the proposed test can be made more powerful to distinguish certain alternatives by incorporating local covariance matrices and constructing an anisotropic kernel. The kernel matrix is asymmetric; it computes the affinity between $n$ data points and a set of $n_R$ reference points, where $n_R$ can be drastically smaller than $n$. While the proposed statistic can be viewed as a special class of Reproducing Kernel Hilbert Space MMD, the consistency of the test is proved, under mild assumptions of the kernel, as long as $\|p-q\| \sqrt{n} \to \infty$, and a finite-sample lower bound of the testing power is obtained. Applications to flow cytometry and diffusion MRI datasets are demonstrated, which motivate the proposed approach to compare distributions.

Via ## Manifold learning with bi-stochastic kernels

Feb 27, 2018    In this paper we answer the following question: what is the infinitesimal generator of the diffusion process defined by a kernel that is normalized such that it is bi-stochastic with respect to a specified measure? More precisely, under the assumption that data is sampled from a Riemannian manifold we determine how the resulting infinitesimal generator depends on the potentially nonuniform distribution of the sample points, and the specified measure for the bi-stochastic normalization. In a special case, we demonstrate a connection to the heat kernel. We consider both the case where only a single data set is given, and the case where a data set and a reference set are given. The spectral theory of the constructed operators is studied, and Nystr\"om extension formulas for the gradients of the eigenfunctions are computed. Applications to discrete point sets and manifold learning are discussed.

* 18 pages, 5 figures
Via ## Data-Driven Tree Transforms and Metrics

Aug 18, 2017    We consider the analysis of high dimensional data given in the form of a matrix with columns consisting of observations and rows consisting of features. Often the data is such that the observations do not reside on a regular grid, and the given order of the features is arbitrary and does not convey a notion of locality. Therefore, traditional transforms and metrics cannot be used for data organization and analysis. In this paper, our goal is to organize the data by defining an appropriate representation and metric such that they respect the smoothness and structure underlying the data. We also aim to generalize the joint clustering of observations and features in the case the data does not fall into clear disjoint groups. For this purpose, we propose multiscale data-driven transforms and metrics based on trees. Their construction is implemented in an iterative refinement procedure that exploits the co-dependencies between features and observations. Beyond the organization of a single dataset, our approach enables us to transfer the organization learned from one dataset to another and to integrate several datasets together. We present an application to breast cancer gene expression analysis: learning metrics on the genes to cluster the tumor samples into cancer sub-types and validating the joint organization of both the genes and the samples. We demonstrate that using our approach to combine information from multiple gene expression cohorts, acquired by different profiling technologies, improves the clustering of tumor samples.

* 16 pages, 5 figures. Accepted to IEEE Transactions on Signal and Information Processing over Networks
Via ## Provable approximation properties for deep neural networks

Mar 28, 2016   We discuss approximation of functions using deep neural nets. Given a function $f$ on a $d$-dimensional manifold $\Gamma \subset \mathbb{R}^m$, we construct a sparsely-connected depth-4 neural network and bound its error in approximating $f$. The size of the network depends on dimension and curvature of the manifold $\Gamma$, the complexity of $f$, in terms of its wavelet description, and only weakly on the ambient dimension $m$. Essentially, our network computes wavelet functions, which are computed from Rectified Linear Units (ReLU)

* accepted for publication in Applied and Computational Harmonic Analysis
Via ## Hierarchical Coupled Geometry Analysis for Neuronal Structure and Activity Pattern Discovery

Nov 06, 2015    In the wake of recent advances in experimental methods in neuroscience, the ability to record in-vivo neuronal activity from awake animals has become feasible. The availability of such rich and detailed physiological measurements calls for the development of advanced data analysis tools, as commonly used techniques do not suffice to capture the spatio-temporal network complexity. In this paper, we propose a new hierarchical coupled geometry analysis, which exploits the hidden connectivity structures between neurons and the dynamic patterns at multiple time-scales. Our approach gives rise to the joint organization of neurons and dynamic patterns in data-driven hierarchical data structures. These structures provide local to global data representations, from local partitioning of the data in flexible trees through a new multiscale metric to a global manifold embedding. The application of our techniques to in-vivo neuronal recordings demonstrate the capability of extracting neuronal activity patterns and identifying temporal trends, associated with particular behavioral events and manipulations introduced in the experiments.

* 13 pages, 9 figures
Via 