Abstract:False discovery rate (FDR) control is a popular approach for maintaining the integrity of statistical analyses, especially in high-dimensional data settings, where multiple comparisons increase the risk of false positives. FDR control has been extensively researched for real-valued data. However, the complex data case, which is relevant for many signal processing applications, remains widely unexplored. We therefore present a fast and FDR-controlling variable selector for complex-valued high-dimensional data. The proposed Complex-Valued Terminating-Random Experiments (CT-Rex) selector controls a user-defined target FDR while maximizing the number of selected variables. This is achieved by optimally fusing the solutions of multiple early terminated complex-valued random experiments. We benchmark the performance in sparse complex regression simulation studies and showcase an example of FDR-controlled compressed-sensing-based single snapshot multi-source detection and direction of arrival (DOA) estimation. The proposed work applies to a wide range of research areas, such as DOA estimation, communications, mechanical engineering, and magnetic resonance imaging, bridging a critical gap in signal processing for complex-valued data.




Abstract:Currently, there is an urgent demand for scalable multivariate and high-dimensional false discovery rate (FDR)-controlling variable selection methods to ensure the repro-ducibility of discoveries. However, among existing methods, only the recently proposed Terminating-Random Experiments (T-Rex) selector scales to problems with millions of variables, as encountered in, e.g., genomics research. The T-Rex selector is a new learning framework based on early terminated random experiments with computer-generated dummy variables. In this work, we propose the Big T-Rex, a new implementation of T-Rex that drastically reduces its Random Access Memory (RAM) consumption to enable solving FDR-controlled sparse regression problems with millions of variables on a laptop. We incorporate advanced memory-mapping techniques to work with matrices that reside on solid-state drive and two new dummy generation strategies based on permutations of a reference matrix. Our nu-merical experiments demonstrate a drastic reduction in memory demand and computation time. We showcase that the Big T-Rex can efficiently solve FDR-controlled Lasso-type problems with five million variables on a laptop in thirty minutes. Our work empowers researchers without access to high-performance clusters to make reproducible discoveries in large-scale high-dimensional data.