Abstract:We study estimation in the low signal-to-noise ratio (SNR) regime for a broad class of Gaussian latent-variable models, including Gaussian mixtures and orbit recovery problems. We show that, in this regime, the generalized method-of-moments (GMoM) matches the first-order asymptotic efficiency of maximum likelihood. In particular, if the moment features are chosen up to the minimal local order required for identification and are weighted optimally, then the resulting GMoM estimator has the same leading asymptotic covariance as the maximum-likelihood estimator. Our analysis shows that, in low SNR, this equivalence is governed by a layered local geometry: different directions become informative at different moment orders, partitioning the space into layers with distinct SNR scalings. We prove that the observed Fisher information and the GMoM information operator admit matching layerwise expansions across these layers. As a consequence, in the low-SNR regime, GMoM provides a statistically efficient alternative to maximum likelihood, while preserving the computational advantages of moment-based estimation.
Abstract:Motivated by structural biology applications, we study the projected multi-reference alignment (MRA) model, in which an unknown signal is observed through noisy samples, each generated by applying a random cyclic shift followed by a fixed projection. The projection merges reflection-symmetric index pairs, thereby discarding orientation information. The goal is to recover the dihedral orbit of the signal. We prove that in the high-noise regime, the first three moments of the projected observations determine a generic dihedral orbit. The main mechanism is a reduction, at the moment level, from projected MRA to the reflection-invariant phase-coupling structure of dihedral MRA. In Fourier-cosine coordinates adapted to the projection, the first moment determines the mean component, the second moment determines the Fourier magnitudes, and selected third moments yield the cosine phase-coupling relations appearing in the dihedral bispectrum. These relations lead to a constructive recovery scheme from moments up to order three. We complement the population theory with finite-sample experiments comparing expectation--maximization (EM), direct moment optimization, and direct Fourier-cosine moment optimization. The results show that, in the high-noise regime, both EM and direct moment optimization are consistent with the predicted third-moment sample-complexity scaling $n \gtrsim σ^6$, where $n$ is the number of observations and $σ^2$ is the noise variance.
Abstract:Let $f:\mathbb{R}^n\to\mathbb{R}$ be an unknown object, and suppose the observations are tomographic projections of randomly rotated copies of $f$ of the form $Y = P(R\cdot f)$, where $R$ is Haar-uniform in $\mathrm{SO}(n)$ and $P$ is the projection onto an $m$-dimensional subspace, so that $Y:\mathbb{R}^m\to\mathbb{R}$. We prove that, whenever $d\le m$, the $d$-th order moment of the projected data determines the full $d$-th order Haar-orbit moment of $f$, independently of the ambient dimension $n$. We further provide an explicit algorithmic procedure for recovering the latter from the former. As a consequence, any identifiability result for the unprojected model based on $d$-th order group-invariant moment extends directly to the tomographic setting at the same moment order. In particular, for $n=3$, $m=2$, and $d=2$, our result recovers a classical result in the cryo-EM literature: the covariance of the 2D projection images determines the second order rotationally invariant moment of the underlying 3D object.
Abstract:We study the recovery of an unknown three-dimensional band-limited signal from multiple noisy observations that are randomly rotated by latent elements of SO(3), where the rotations are drawn from an unknown, non-uniform distribution. Because the rotations are unobserved, only the signal orbit under the rotation group can be recovered. We show that the signal orbit and the rotation distribution are jointly identifiable from the first and second moments. This yields an improved high-noise sample complexity that scales quadratically with the noise variance, rather than cubically as in the uniform-rotation case. We further develop a provable, computationally efficient reconstruction algorithm that recovers the 3-D signal by successively solving a sequence of well-conditioned linear systems. The algorithm is validated through extensive numerical experiments. Our results provide a principled and tractable framework for high-noise 3-D orbit recovery, with potential relevance to cryo-electron microscopy and cryo-electron tomography modeling, where molecules are observed in unknown orientations.




Abstract:We study the orbit recovery problem under the rigid-motion group SE(n), where the objective is to reconstruct an unknown signal from multiple noisy observations subjected to unknown rotations and translations. This problem is fundamental in signal processing, computer vision, and structural biology. Our main theoretical contribution is bounding the sample complexity of this problem. We show that if the d-th order moment under the rotation group SO(n) uniquely determines the signal orbit, then orbit recovery under SE(n) is achievable with $N\gtrsim σ^{2d+4}$ samples as the noise variance $σ^2 \to \infty$. The key technical insight is that the d-th order SO(n) moments can be explicitly recovered from (d+2)-order SE(n) autocorrelations, enabling us to transfer known results from the rotation-only setting to the rigid-motion case. We further harness this result to derive a matching bound to the sample complexity of the multi-target detection model that serves as an abstract framework for electron-microscopy-based technologies in structural biology, such as single-particle cryo-electron microscopy (cryo-EM) and cryo-electron tomography (cryo-ET). Beyond theory, we present a provable computational pipeline for rigid-motion orbit recovery in three dimensions. Starting from rigid-motion autocorrelations, we extract the SO(3) moments and demonstrate successful reconstruction of a 3-D macromolecular structure. Importantly, this algorithmic approach is valid at any noise level, suggesting that even very small macromolecules, long believed to be inaccessible using structural biology electron-microscopy-based technologies, may, in principle, be reconstructed given sufficient data.

Abstract:The generalized phase retrieval problem over compact groups aims to recover a set of matrices, representing an unknown signal, from their associated Gram matrices, leveraging prior structural knowledge about the signal. This framework generalizes the classical phase retrieval problem, which reconstructs a signal from the magnitudes of its Fourier transform, to a richer setting involving non-abelian compact groups. In this broader context, the unknown phases in Fourier space are replaced by unknown orthogonal matrices that arise from the action of a compact group on a finite-dimensional vector space. This problem is primarily motivated by advances in electron microscopy to determining the 3D structure of biological macromolecules from highly noisy observations. To capture realistic assumptions from machine learning and signal processing, we model the signal as belonging to one of several broad structural families: a generic linear subspace, a sparse representation in a generic basis, the output of a generic ReLU neural network, or a generic low-dimensional manifold. Our main result shows that, under mild conditions, the generalized phase retrieval problem not only admits a unique solution (up to inherent group symmetries), but also satisfies a bi-Lipschitz property. This implies robustness to both noise and model mismatch, an essential requirement for practical use, especially when measurements are severely corrupted by noise. These findings provide theoretical support for a wide class of scientific problems under modern structural assumptions, and they offer strong foundations for developing robust algorithms in high-noise regimes.

Abstract:The classical phase retrieval problem involves estimating a signal from its Fourier magnitudes (power spectrum) by leveraging prior information about the desired signal. This paper extends the problem to compact groups, addressing the recovery of a set of matrices from their Gram matrices. In this broader context, the missing phases in Fourier space are replaced by missing unitary or orthogonal matrices arising from the action of a compact group on a finite-dimensional vector space. This generalization is driven by applications in multi-reference alignment and single-particle cryo-electron microscopy, a pivotal technology in structural biology. We define the generalized phase retrieval problem over compact groups and explore its underlying algebraic structure. We survey recent results on the uniqueness of solutions, focusing on the significant class of semialgebraic priors. Furthermore, we present a family of algorithms inspired by classical phase retrieval techniques. Finally, we propose a conjecture on the stability of the problem based on bi-Lipschitz analysis, supported by numerical experiments.
Abstract:Semi-algebraic priors are ubiquitous in signal processing and machine learning. Prevalent examples include a) linear models where the signal lies in a low-dimensional subspace; b) sparse models where the signal can be represented by only a few coefficients under a suitable basis; and c) a large family of neural network generative models. In this paper, we prove a transversality theorem for semi-algebraic sets in orthogonal or unitary representations of groups: with a suitable dimension bound, a generic translate of any semi-algebraic set is transverse to the orbits of the group action. This, in turn, implies that if a signal lies in a low-dimensional semi-algebraic set, then it can be recovered uniquely from measurements that separate orbits. As an application, we consider the implications of the transversality theorem to the problem of recovering signals that are translated by random group actions from their second moment. As a special case, we discuss cryo-EM: a leading technology to constitute the spatial structure of biological molecules, which serves as our prime motivation. In particular, we derive explicit bounds for recovering a molecular structure from the second moment under a semi-algebraic prior and deduce information-theoretic implications. We also obtain information-theoretic bounds for three additional applications: factoring Gram matrices, multi-reference alignment, and phase retrieval. Finally, we deduce bounds for designing permutation invariant separators in machine learning.
Abstract:The classical beltway problem entails recovering a set of points from their unordered pairwise distances on the circle. This problem can be viewed as a special case of the crystallographic phase retrieval problem of recovering a sparse signal from its periodic autocorrelation. Based on this interpretation, and motivated by cryo-electron microscopy, we suggest a natural generalization to orthogonal groups: recovering a sparse signal, up to an orthogonal transformation, from its autocorrelation over the orthogonal group. If the support of the signal is collision-free, we bound the number of solutions to the beltway problem over orthogonal groups, and prove that this bound is exactly one when the support of the signal is radially collision-free (i.e., the support points have distinct magnitudes). We also prove that if the pairwise products of the signal's weights are distinct, then the autocorrelation determines the signal uniquely, up to an orthogonal transformation. We conclude the paper by considering binary signals and show that in this case, the collision-free condition need not be sufficient to determine signals up to orthogonal transformation.
Abstract:The key ingredient to retrieving a signal from its Fourier magnitudes, namely, to solve the phase retrieval problem, is an effective prior on the sought signal. In this paper, we study the phase retrieval problem under the prior that the signal lies in a semi-algebraic set. This is a very general prior as semi-algebraic sets include linear models, sparse models, and ReLU neural network generative models. The latter is the main motivation of this paper, due to the remarkable success of deep generative models in a variety of imaging tasks, including phase retrieval. We prove that almost all signals in R^N can be determined from their Fourier magnitudes, up to a sign, if they lie in a (generic) semi-algebraic set of dimension N/2. The same is true for all signals if the semi-algebraic set is of dimension N/4. We also generalize these results to the problem of signal recovery from the second moment in multi-reference alignment models with multiplicity free representations of compact groups. This general result is then used to derive improved sample complexity bounds for recovering band-limited functions on the sphere from their noisy copies, each acted upon by a random element of SO(3).