Abstract:We consider \emph{bilinearly indexed random processes} (blirp) and study their interpolating comparative mechanisms. Generic introduction of the \emph{fully lifted} (fl) blirp interpolation in [105] was followed by a corresponding stationarization counterpart in [103]. A \emph{large deviation} upgrade of [105] introduced in companion paper [106] is complemented here with the corresponding one of [103]. Similarly to [106], the mechanism that we introduce extends the range of [103]'s applicability so that it encompasses random structures \emph{atypical} features. Among others these include the \emph{local entropies} (LE) which explain atypical solutions clusterings in hard random optimization problems believed to be directly responsible for the presumable existence of the so-called \emph{computational gaps}. Moreover (and similar to [105]), despite on occasion somewhat involved technical considerations, the final forms of the uncovered fundamental interpolating parameters relations are rather elegant and as such provide a valuable tool readily available for further use.
Abstract:[104] introduced a powerful \emph{fully lifted} (fl) statistical interpolating mechanism. It established a nested connection between blirps (bilinearly indexed random processes) and their decoupled (linearly indexed) comparative counterparts. We here revisit the comparison from [104] and introduce its a \emph{large deviation} upgrade. The new machinery allows to substantially widen the [104]'s range of applicability. In addition to \emph{typical}, studying analytically much harder \emph{atypical} random structures features is now possible as well. To give a bit of a practical flavor, we show how the obtained results connect to the so-called \emph{local entropies} (LE) and their predicated role in understanding solutions clustering and associated \emph{computational gaps} in hard random optimization problems. As was the case in [104], even though the technical considerations often appear as fairly involved, the final interpolating forms admit elegant expressions thereby providing a relatively easy to use tool readily available for further studies. Moreover, as the considered models encompass all well known random structures discussed in [104], the obtained results automatically apply to them as well.
Abstract:We study classical asymmetric binary perceptron (ABP) and associated \emph{local entropy} (LE) as potential source of its algorithmic hardness. Isolation of \emph{typical} ABP solutions in SAT phase seemingly suggests a universal algorithmic hardness. Paradoxically, efficient algorithms do exist even for constraint densities $\alpha$ fairly close but at a finite distance (\emph{computational gap}) from the capacity. In recent years, existence of rare large dense clusters and magical ability of fast algorithms to find them have been posited as the conceptual resolution of this paradox. Monotonicity or breakdown of the LEs associated with such \emph{atypical} clusters are predicated to play a key role in their thinning-out or even complete defragmentation. Invention of fully lifted random duality theory (fl RDT) [90,93,94] allows studying random structures \emph{typical} features. A large deviation upgrade, sfl LD RDT [96,97], moves things further and enables \emph{atypical} features characterizations as well. Utilizing the machinery of [96,97] we here develop a generic framework to study LE as an ABP's atypical feature. Already on the second level of lifting we discover that the LE results are closely matching those obtained through replica methods. For classical zero threshold ABP, we obtain that LE breaks down for $\alpha$ in $(0.77,0.78)$ interval which basically matches $\alpha\sim 0.75-0.77$ range that currently best ABP solvers can handle and effectively indicates that LE's behavior might indeed be among key reflections of the ABP's computational gaps presumable existence.
Abstract:We study deep ReLU feed forward neural networks (NN) and their injectivity abilities. The main focus is on \emph{precisely} determining the so-called injectivity capacity. For any given hidden layers architecture, it is defined as the minimal ratio between number of network's outputs and inputs which ensures unique recoverability of the input from a realizable output. A strong recent progress in precisely studying single ReLU layer injectivity properties is here moved to a deep network level. In particular, we develop a program that connects deep $l$-layer net injectivity to an $l$-extension of the $\ell_0$ spherical perceptrons, thereby massively generalizing an isomorphism between studying single layer injectivity and the capacity of the so-called (1-extension) $\ell_0$ spherical perceptrons discussed in [82]. \emph{Random duality theory} (RDT) based machinery is then created and utilized to statistically handle properties of the extended $\ell_0$ spherical perceptrons and implicitly of the deep ReLU NNs. A sizeable set of numerical evaluations is conducted as well to put the entire RDT machinery in practical use. From these we observe a rapidly decreasing tendency in needed layers' expansions, i.e., we observe a rapid \emph{expansion saturation effect}. Only $4$ layers of depth are sufficient to closely approach level of no needed expansion -- a result that fairly closely resembles observations made in practical experiments and that has so far remained completely untouchable by any of the existing mathematical methodologies.
Abstract:We consider the injectivity property of the ReLU networks layers. Determining the ReLU injectivity capacity (ratio of the number of layer's inputs and outputs) is established as isomorphic to determining the capacity of the so-called $\ell_0$ spherical perceptron. Employing \emph{fully lifted random duality theory} (fl RDT) a powerful program is developed and utilized to handle the $\ell_0$ spherical perceptron and implicitly the ReLU layers injectivity. To put the entire fl RDT machinery in practical use, a sizeable set of numerical evaluations is conducted as well. The lifting mechanism is observed to converge remarkably fast with relative corrections in the estimated quantities not exceeding $\sim 0.1\%$ already on the third level of lifting. Closed form explicit analytical relations among key lifting parameters are uncovered as well. In addition to being of incredible importance in handling all the required numerical work, these relations also shed a new light on beautiful parametric interconnections within the lifting structure. Finally, the obtained results are also shown to fairly closely match the replica predictions from [40].
Abstract:We study the theoretical limits of the $\ell_0$ (quasi) norm based optimization algorithms when employed for solving classical compressed sensing or sparse regression problems. Considering standard contexts with deterministic signals and statistical systems, we utilize \emph{Fully lifted random duality theory} (Fl RDT) and develop a generic analytical program for studying performance of the \emph{maximum-likelihood} (ML) decoding. The key ML performance parameter, the residual \emph{root mean square error} ($\textbf{RMSE}$), is uncovered to exhibit the so-called \emph{phase-transition} (PT) phenomenon. The associated aPT curve, which separates the regions of systems dimensions where \emph{an} $\ell_0$ based algorithm succeeds or fails in achieving small (comparable to the noise) ML optimal $\textbf{RMSE}$ is precisely determined as well. In parallel, we uncover the existence of another dPT curve which does the same separation but for practically feasible \emph{descending} $\ell_0$ ($d\ell_0$) algorithms. Concrete implementation and practical relevance of the Fl RDT typically rely on the ability to conduct a sizeable set of the underlying numerical evaluations which reveal that for the ML decoding the Fl RDT converges astonishingly fast with corrections in the estimated quantities not exceeding $\sim 0.1\%$ already on the third level of lifting. Analytical results are supplemented by a sizeable set of numerical experiments where we implement a simple variant of $d\ell_0$ and demonstrate that its practical performance very accurately matches the theoretical predictions. Completely surprisingly, a remarkably precise agreement between the simulations and the theory is observed for fairly small dimensions of the order of 100.
Abstract:We consider fully row/column-correlated linear regression models and study several classical estimators (including minimum norm interpolators (GLS), ordinary least squares (LS), and ridge regressors). We show that \emph{Random Duality Theory} (RDT) can be utilized to obtain precise closed form characterizations of all estimators related optimizing quantities of interest, including the \emph{prediction risk} (testing or generalization error). On a qualitative level out results recover the risk's well known non-monotonic (so-called double-descent) behavior as the number of features/sample size ratio increases. On a quantitative level, our closed form results show how the risk explicitly depends on all key model parameters, including the problem dimensions and covariance matrices. Moreover, a special case of our results, obtained when intra-sample (or time-series) correlations are not present, precisely match the corresponding ones obtained via spectral methods in [6,16,17,24].
Abstract:We consider correlated \emph{factor} regression models (FRM) and analyze the performance of classical ridge interpolators. Utilizing powerful \emph{Random Duality Theory} (RDT) mathematical engine, we obtain \emph{precise} closed form characterizations of the underlying optimization problems and all associated optimizing quantities. In particular, we provide \emph{excess prediction risk} characterizations that clearly show the dependence on all key model parameters, covariance matrices, loadings, and dimensions. As a function of the over-parametrization ratio, the generalized least squares (GLS) risk also exhibits the well known \emph{double-descent} (non-monotonic) behavior. Similarly to the classical linear regression models (LRM), we demonstrate that such FRM phenomenon can be smoothened out by the optimally tuned ridge regularization. The theoretical results are supplemented by numerical simulations and an excellent agrement between the two is observed. Moreover, we note that ``ridge smootenhing'' is often of limited effect already for over-parametrization ratios above $5$ and of virtually no effect for those above $10$. This solidifies the notion that one of the recently most popular neural networks paradigms -- \emph{zero-training (interpolating) generalizes well} -- enjoys wider applicability, including the one within the FRM estimation/prediction context.
Abstract:We consider \emph{random linear programs} (rlps) as a subclass of \emph{random optimization problems} (rops) and study their typical behavior. Our particular focus is on appropriate linear objectives which connect the rlps to the mean widths of random polyhedrons/polytopes. Utilizing the powerful machinery of \emph{random duality theory} (RDT) \cite{StojnicRegRndDlt10}, we obtain, in a large dimensional context, the exact characterizations of the program's objectives. In particular, for any $\alpha=\lim_{n\rightarrow\infty}\frac{m}{n}\in(0,\infty)$, any unit vector $\mathbf{c}\in{\mathbb R}^n$, any fixed $\mathbf{a}\in{\mathbb R}^n$, and $A\in {\mathbb R}^{m\times n}$ with iid standard normal entries, we have \begin{eqnarray*} \lim_{n\rightarrow\infty}{\mathbb P}_{A} \left ( (1-\epsilon) \xi_{opt}(\alpha;\mathbf{a}) \leq \min_{A\mathbf{x}\leq \mathbf{a}}\mathbf{c}^T\mathbf{x} \leq (1+\epsilon) \xi_{opt}(\alpha;\mathbf{a}) \right ) \longrightarrow 1, \end{eqnarray*} where \begin{equation*} \xi_{opt}(\alpha;\mathbf{a}) \triangleq \min_{x>0} \sqrt{x^2- x^2 \lim_{n\rightarrow\infty} \frac{\sum_{i=1}^{m} \left ( \frac{1}{2} \left (\left ( \frac{\mathbf{a}_i}{x}\right )^2 + 1\right ) \mbox{erfc}\left( \frac{\mathbf{a}_i}{x\sqrt{2}}\right ) - \frac{\mathbf{a}_i}{x} \frac{e^{-\frac{\mathbf{a}_i^2}{2x^2}}}{\sqrt{2\pi}} \right ) }{n} }. \end{equation*} For example, for $\mathbf{a}=\mathbf{1}$, one uncovers \begin{equation*} \xi_{opt}(\alpha) = \min_{x>0} \sqrt{x^2- x^2 \alpha \left ( \frac{1}{2} \left ( \frac{1}{x^2} + 1\right ) \mbox{erfc} \left ( \frac{1}{x\sqrt{2}}\right ) - \frac{1}{x} \frac{e^{-\frac{1}{2x^2}}}{\sqrt{2\pi}} \right ) }. \end{equation*} Moreover, $2 \xi_{opt}(\alpha)$ is precisely the concentrating point of the mean width of the polyhedron $\{\mathbf{x}|A\mathbf{x} \leq \mathbf{1}\}$.
Abstract:In \cite{Hop82}, Hopfield introduced a \emph{Hebbian} learning rule based neural network model and suggested how it can efficiently operate as an associative memory. Studying random binary patterns, he also uncovered that, if a small fraction of errors is tolerated in the stored patterns retrieval, the capacity of the network (maximal number of memorized patterns, $m$) scales linearly with each pattern's size, $n$. Moreover, he famously predicted $\alpha_c=\lim_{n\rightarrow\infty}\frac{m}{n}\approx 0.14$. We study this very same scenario with two famous pattern's basins of attraction: \textbf{\emph{(i)}} The AGS one from \cite{AmiGutSom85}; and \textbf{\emph{(ii)}} The NLT one from \cite{Newman88,Louk94,Louk94a,Louk97,Tal98}. Relying on the \emph{fully lifted random duality theory} (fl RDT) from \cite{Stojnicflrdt23}, we obtain the following explicit capacity characterizations on the first level of lifting: \begin{equation} \alpha_c^{(AGS,1)} = \left ( \max_{\delta\in \left ( 0,\frac{1}{2}\right ) }\frac{1-2\delta}{\sqrt{2} \mbox{erfinv} \left ( 1-2\delta\right )} - \frac{2}{\sqrt{2\pi}} e^{-\left ( \mbox{erfinv}\left ( 1-2\delta \right )\right )^2}\right )^2 \approx \mathbf{0.137906} \end{equation} \begin{equation} \alpha_c^{(NLT,1)} = \frac{\mbox{erf}(x)^2}{2x^2}-1+\mbox{erf}(x)^2 \approx \mathbf{0.129490}, \quad 1-\mbox{erf}(x)^2- \frac{2\mbox{erf}(x)e^{-x^2}}{\sqrt{\pi}x}+\frac{2e^{-2x^2}}{\pi}=0. \end{equation} A substantial numerical work gives on the second level of lifting $\alpha_c^{(AGS,2)} \approx \mathbf{0.138186}$ and $\alpha_c^{(NLT,2)} \approx \mathbf{0.12979}$, effectively uncovering a remarkably fast lifting convergence. Moreover, the obtained AGS characterizations exactly match the replica symmetry based ones of \cite{AmiGutSom85} and the corresponding symmetry breaking ones of \cite{SteKuh94}.