The classical beltway problem entails recovering a set of points from their unordered pairwise distances on the circle. This problem can be viewed as a special case of the crystallographic phase retrieval problem of recovering a sparse signal from its periodic autocorrelation. Based on this interpretation, and motivated by cryo-electron microscopy, we suggest a natural generalization to orthogonal groups: recovering a sparse signal, up to an orthogonal transformation, from its autocorrelation over the orthogonal group. If the support of the signal is collision-free, we bound the number of solutions to the beltway problem over orthogonal groups, and prove that this bound is exactly one when the support of the signal is radially collision-free (i.e., the support points have distinct magnitudes). We also prove that if the pairwise products of the signal's weights are distinct, then the autocorrelation determines the signal uniquely, up to an orthogonal transformation. We conclude the paper by considering binary signals and show that in this case, the collision-free condition need not be sufficient to determine signals up to orthogonal transformation.
This paper studies the classical problem of detecting the location of multiple image occurrences in a two-dimensional, noisy measurement. Assuming the image occurrences do not overlap, we formulate this task as a constrained maximum likelihood optimization problem. We show that the maximum likelihood estimator is equivalent to an instance of the winner determination problem from the field of combinatorial auction, and that the solution can be obtained by searching over a binary tree. We then design a pruning mechanism that significantly accelerates the runtime of the search. We demonstrate on simulations and electron microscopy data sets that the proposed algorithm provides accurate detection in challenging regimes of high noise levels and densely packed image occurrences.
Multi-target detection (MTD) is the problem of estimating an image from a large, noisy measurement that contains randomly translated and rotated copies of the image. Motivated by the single-particle cryo-electron microscopy technology, we design data-driven diffusion priors for the MTD problem, derived from score-based stochastic differential equations models. We then integrate the prior into the approximate expectation-maximization algorithm. In particular, our method alternates between an expectation step that approximates the expected log-likelihood and a maximization step that balances the approximated log-likelihood with the learned log-prior. We show on two datasets that adding the data-driven prior substantially reduces the estimation error, in particular in high noise regimes.
The key ingredient to retrieving a signal from its Fourier magnitudes, namely, to solve the phase retrieval problem, is an effective prior on the sought signal. In this paper, we study the phase retrieval problem under the prior that the signal lies in a semi-algebraic set. This is a very general prior as semi-algebraic sets include linear models, sparse models, and ReLU neural network generative models. The latter is the main motivation of this paper, due to the remarkable success of deep generative models in a variety of imaging tasks, including phase retrieval. We prove that almost all signals in R^N can be determined from their Fourier magnitudes, up to a sign, if they lie in a (generic) semi-algebraic set of dimension N/2. The same is true for all signals if the semi-algebraic set is of dimension N/4. We also generalize these results to the problem of signal recovery from the second moment in multi-reference alignment models with multiplicity free representations of compact groups. This general result is then used to derive improved sample complexity bounds for recovering band-limited functions on the sphere from their noisy copies, each acted upon by a random element of SO(3).
A single-particle cryo-electron microscopy (cryo-EM) measurement, called a micrograph, consists of multiple two-dimensional tomographic projections of a three-dimensional molecular structure at unknown locations, taken under unknown viewing directions. All existing cryo-EM algorithmic pipelines first locate and extract the projection images, and then reconstruct the structure from the extracted images. However, if the molecular structure is small, the signal-to-noise ratio (SNR) of the data is very low, and thus accurate detection of projection images within the micrograph is challenging. Consequently, all standard techniques fail in low-SNR regimes. To recover molecular structures from measurements of low SNR, and in particular small molecular structures, we devise a stochastic approximate expectation-maximization algorithm to estimate the three-dimensional structure directly from the micrograph, bypassing locating the projection images. We corroborate our computational scheme with numerical experiments, and present successful structure recoveries from simulated noisy measurements.
We consider the finite alphabet phase retrieval problem: recovering a signal whose entries lie in a small alphabet of possible values from its Fourier magnitudes. This problem arises in the celebrated technology of X-ray crystallography to determine the atomic structure of biological molecules. Our main result states that for generic values of the alphabet, two signals have the same Fourier magnitudes if and only if several partitions have the same difference sets. Thus, the finite alphabet phase retrieval problem reduces to the combinatorial problem of determining a signal from those difference sets. Notably, this result holds true when one of the letters of the alphabet is zero, namely, for sparse signals with finite alphabet, which is the situation in X-ray crystallography.
Different tasks in the computational pipeline of single-particle cryo-electron microscopy (cryo-EM) require enhancing the quality of the highly noisy raw images. To this end, we develop an efficient algorithm for signal enhancement of cryo-EM images. The enhanced images can be used for a variety of downstream tasks, such as 2-D classification, removing uninformative images, constructing {ab initio} models, generating templates for particle picking, providing a quick assessment of the data set, dimensionality reduction, and symmetry detection. The algorithm includes built-in quality measures to assess its performance and alleviate the risk of model bias. We demonstrate the effectiveness of the proposed algorithm on several experimental data sets. In particular, we show that the quality of the resulting images is high enough to produce ab initio models of $\sim 10$ \AA resolution. The algorithm is accompanied by a publicly available, documented and easy-to-use code.
Multi-reference alignment (MRA) is the problem of recovering a signal from its multiple noisy copies, each acted upon by a random group element. MRA is mainly motivated by single-particle cryo-electron microscopy (cryo-EM) that has recently joined X-ray crystallography as one of the two leading technologies to reconstruct biological molecular structures. Previous papers have shown that in the high noise regime, the sample complexity of MRA and cryo-EM is $n=\omega(\sigma^{2d})$, where $n$ is the number of observations, $\sigma^2$ is the variance of the noise, and $d$ is the lowest-order moment of the observations that uniquely determines the signal. In particular, it was shown that in many cases, $d=3$ for generic signals, and thus the sample complexity is $n=\omega(\sigma^6)$. In this paper, we analyze the second moment of the MRA and cryo-EM models. First, we show that in both models the second moment determines the signal up to a set of unitary matrices, whose dimension is governed by the decomposition of the space of signals into irreducible representations of the group. Second, we derive sparsity conditions under which a signal can be recovered from the second moment, implying sample complexity of $n=\omega(\sigma^4)$. Notably, we show that the sample complexity of cryo-EM is $n=\omega(\sigma^4)$ if at most one third of the coefficients representing the molecular structure are non-zero; this bound is near-optimal. The analysis is based on tools from representation theory and algebraic geometry. We also derive bounds on recovering a sparse signal from its power spectrum, which is the main computational problem of X-ray crystallography.
Single-particle cryo-electron microscopy (cryo-EM) is a leading technology to resolve the structure of molecules. Early in the process, the user detects potential particle images in the raw data. Typically, there are many false detections as a result of high levels of noise and contamination. Currently, removing the false detections requires human intervention to sort the hundred thousands of images. We propose a statistically-established unsupervised algorithm to remove non-particle images. We model the particle images as a union of low-dimensional subspaces, assuming non-particle images are arbitrarily scattered in the high-dimensional space. The algorithm is based on an extension of the probabilistic PCA framework to robustly learn a non-linear model of union of subspaces. This provides a flexible model for cryo-EM data, and allows to automatically remove images that correspond to pure noise and contamination. Numerical experiments corroborate the effectiveness of the sorting algorithm.
This paper studies the classical problem of estimating the locations of signal occurrences in a noisy measurement. Based on a multiple hypothesis testing scheme, we design a K-sample statistical test to control the false discovery rate (FDR). Specifically, we first convolve the noisy measurement with a smoothing kernel, and find all local maxima. Then, we evaluate the joint probability of K entries in the vicinity of each local maximum, derive the corresponding p-value, and apply the Benjamini-Hochberg procedure to account for multiplicity. We demonstrate through extensive experiments that our proposed method, with K=2, controls the prescribed FDR while increasing the power compared to a one-sample test.